[RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

Zev Weiss zev at bewilderbeest.net
Fri Feb 10 11:14:40 AEDT 2023


On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>> While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>> (Ice Lake, 32 cores), I discovered that the core numbering used by the
>> PECI interface appears to correspond to the cores that are present in
>> the physical silicon, rather than those that are actually enabled and
>> usable by the host OS (i.e. it includes cores that the chip was
>> manufactured with but later had fused off).
>>
>> Thus far the cputemp driver has transparently exposed that numbering
>> to userspace in its 'tempX_label' sysfs files, making the core numbers
>> it reported not align with the core numbering used by the host system,
>> which seems like an unfortunate source of confusion.
>>
>> We can instead use a separate counter to label the cores in a
>> contiguous fashion (0 through numcores-1) so that the core numbering
>> reported by the PECI cputemp driver matches the numbering seen by the
>> host.
>>
>
>I don't really have an opinion if this change is desirable or not.
>I suspect one could argue either way. I'l definitely want to see
>feedback from others. Any comments or thoughts, anyone ?
>

Agreed, I'd definitely like to get some input from Intel folks on this.

Though since I realize my initial email didn't quite explain this 
explicitly, I should probably clarify with an example how weird the 
numbering can get with the existing code -- on the 32-core CPU I'm 
working with at the moment, the tempX_label files produce the following 
core numbers:

     Core 0
     Core 1
     Core 2
     Core 3
     Core 4
     Core 5
     Core 6
     Core 7
     Core 8
     Core 9
     Core 11
     Core 12
     Core 13
     Core 14
     Core 15
     Core 18
     Core 20
     Core 22
     Core 23
     Core 24
     Core 26
     Core 27
     Core 28
     Core 29
     Core 30
     Core 31
     Core 33
     Core 34
     Core 35
     Core 36
     Core 38
     Core 39

i.e. it's not just a different permutation of the expected core numbers, 
we end up with gaps (e.g. the nonexistence of core 10), and core numbers 
well in excess of the number of cores the processor really "has" (e.g.  
number 39) -- all of which seems like a rather confusing thing to see in 
your BMC's sensor readings.


Thanks,
Zev



More information about the openbmc mailing list