[RFC PATCH] hwmon: (peci/cputemp) Number cores as seen by host system

Zev Weiss zev at bewilderbeest.net
Wed Feb 22 10:55:53 AEDT 2023


On Sat, Feb 18, 2023 at 01:20:14PM PST, Winiarska, Iwona wrote:
>On Fri, 2023-02-10 at 10:45 -0800, Guenter Roeck wrote:
>> On Thu, Feb 09, 2023 at 05:48:41PM -0800, Zev Weiss wrote:
>> > On Thu, Feb 09, 2023 at 04:26:47PM PST, Guenter Roeck wrote:
>> > > On 2/9/23 16:14, Zev Weiss wrote:
>> > > > On Thu, Feb 09, 2023 at 09:50:01AM PST, Guenter Roeck wrote:
>> > > > > On Wed, Feb 08, 2023 at 05:16:32PM -0800, Zev Weiss wrote:
>> > > > > > While porting OpenBMC to a new platform with a Xeon Gold 6314U CPU
>> > > > > > (Ice Lake, 32 cores), I discovered that the core numbering used by
>> > > > > > the
>> > > > > > PECI interface appears to correspond to the cores that are present
>> > > > > > in
>> > > > > > the physical silicon, rather than those that are actually enabled
>> > > > > > and
>> > > > > > usable by the host OS (i.e. it includes cores that the chip was
>> > > > > > manufactured with but later had fused off).
>> > > > > >
>> > > > > > Thus far the cputemp driver has transparently exposed that numbering
>> > > > > > to userspace in its 'tempX_label' sysfs files, making the core
>> > > > > > numbers
>> > > > > > it reported not align with the core numbering used by the host
>> > > > > > system,
>> > > > > > which seems like an unfortunate source of confusion.
>> > > > > >
>> > > > > > We can instead use a separate counter to label the cores in a
>> > > > > > contiguous fashion (0 through numcores-1) so that the core numbering
>> > > > > > reported by the PECI cputemp driver matches the numbering seen by
>> > > > > > the
>> > > > > > host.
>> > > > > >
>> > > > >
>> > > > > I don't really have an opinion if this change is desirable or not.
>> > > > > I suspect one could argue either way. I'l definitely want to see
>> > > > > feedback from others. Any comments or thoughts, anyone ?
>> > > > >
>> > > >
>> > > > Agreed, I'd definitely like to get some input from Intel folks on this.
>> > > >
>> > > > Though since I realize my initial email didn't quite explain this
>> > > > explicitly, I should probably clarify with an example how weird the
>> > > > numbering can get with the existing code -- on the 32-core CPU I'm
>> > > > working with at the moment, the tempX_label files produce the following
>> > > > core numbers:
>> > > >
>> > > >     Core 0
>> > > >     Core 1
>> > > >     Core 2
>> > > >     Core 3
>> > > >     Core 4
>> > > >     Core 5
>> > > >     Core 6
>> > > >     Core 7
>> > > >     Core 8
>> > > >     Core 9
>> > > >     Core 11
>> > > >     Core 12
>> > > >     Core 13
>> > > >     Core 14
>> > > >     Core 15
>> > > >     Core 18
>> > > >     Core 20
>> > > >     Core 22
>> > > >     Core 23
>> > > >     Core 24
>> > > >     Core 26
>> > > >     Core 27
>> > > >     Core 28
>> > > >     Core 29
>> > > >     Core 30
>> > > >     Core 31
>> > > >     Core 33
>> > > >     Core 34
>> > > >     Core 35
>> > > >     Core 36
>> > > >     Core 38
>> > > >     Core 39
>> > > >
>> > > > i.e. it's not just a different permutation of the expected core numbers,
>> > > > we end up with gaps (e.g. the nonexistence of core 10), and core numbers
>> > > > well in excess of the number of cores the processor really "has" (e.g.
>> > > > number 39) -- all of which seems like a rather confusing thing to see in
>> > > > your BMC's sensor readings.
>> > > >
>> > >
>> > > Sure, but what do you see with /proc/cpuinfo and with coretemp
>> > > on the host ? It might be even more confusing if the core numbers
>> > > reported by the peci driver don't match the core numbers provided
>> > > by other tools.
>> > >
>> >
>> > The host sees them numbered as the usual 0-31 you'd generally expect, and
>> > assigned to those cores in the same increasing order -- hence the patch
>> > bringing the two into alignment with each other.  Currently only cores 0
>> > through 9 match up between the two, and the rest are off by somewhere
>> > between one and eight.
>> >
>>
>> Hmm, interesting. It is not sequential on my large system (Intel(R) Xeon(R)
>> Gold 6154). I also know for sure that core IDs on Intel server CPUs are
>> typically not sequential. The processor number is sequential, but the core
>> ID isn't. On my system, the output from the "sensors" command (that is,
>> from the coretemp driver) matches the non-sequential core IDs from
>> /proc/cpuinfo, which is exactly how I would expect it to be.
>>
>> Guenter
>
>On Linux, from host side, core ID is obtained from EDX of CPUID(EAX=0xb).
>Unfortunately, the value exposed to the host (and whether it's in sequential or
>non-sequential form) can vary from platform to platform (which BTW is why on
>Linux, core ID shouldn't really be used for any logic related to task placement
>- topology info should be used instead).
>From BMC perspective - we'll always get the non-sequential form.
>
>If we just apply the patch proposed by Zev, we'll end up being consistent on one
>set of platforms and inconsistent on other set of platforms.
>If we want to make things consistent, we need a different approach - either by
>obtaining additional information over PECI or by limiting the scope of the
>proposed change to specific platforms.
>
>Thanks
>-Iwona
>

Okay, I was sort of afraid of something like that.

Does PECI provide the necessary information to reliably map its 
(physical silicon I presume) core numbers to the logical numbers seen by 
the host OS?  The PECI specs I have don't seem to mention anything along 
those lines as far as I can see, though perhaps there are newer or more 
detailed ones I don't have access to.

If not, how difficult would it be to classify known CPU models by 
distinct core-numbering schemes to handle it "manually" in the driver?  
If the necessary information is available I could try to develop a patch 
for it.


Thanks,
Zev



More information about the openbmc mailing list