[Skiboot] [PATCH] opal/hmi: Display correct chip id while printing NPU FIRs.

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Mon Jun 4 16:09:47 AEST 2018


On 06/01/2018 08:54 PM, Balbir Singh wrote:
> On Thu, May 31, 2018 at 6:34 PM, Mahesh J Salgaonkar
> <mahesh at linux.vnet.ibm.com> wrote:
>> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>>
>> HMIs for NPU xstops are broadcasted to all chips. All cores on all the
>> chips receive HMI. HMI handler correctly identifies and extracts the
>> NPU FIR details from affected chip, but while printing FIR data it
>> prints chip id and location code details of this_cpu()->chip_id which
>> may not be correct. This patch fixes this issue.
>>
> 
> The core does not matter here, does it? It's the NPU, it's one per
> socket/processor
> If fir, mask and action match, this_cpu()->chip_id's location code
> should be correct.
> Am I missing something? Have you seen examples of this printing the wrong thing
> or did you catch this via code-review?

There is a BUG reported where submitter see same NPU FIR value shown for
both chip ids. It creates a confusion as which NPU got an error. The
code checks for pMisc Receive Malfunction Alert to find out correct chip
id (flat_chip_id) where NPU got an error. We query NPU FIRs using
flat_chip_id not this_cpu()->chip_id

# putscom -c 0x0  0x5013C00 0x4000000000000000
[84667.176936787,3] HMI: NPU2: [Loc: UOPWR.786ECFA-Node0-Proc0] P:0
FIR#0 FIR 0x4000000000000000 mask 0x009a48180f03ffff
[84667.187268490,3] HMI: NPU2: [Loc: UOPWR.786ECFA-Node0-Proc1] P:8
FIR#0 FIR 0x4000000000000000 mask 0x009a48180f03ffff

The second o/p above creates the confusion. It prints the same info but
with different chip id and location code.

Thanks,
-Mahesh


> 
> Balbir
> 



More information about the Skiboot mailing list