[PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle

Christophe Leroy (CS GROUP) chleroy at kernel.org
Thu Feb 26 00:54:28 AEDT 2026


Hi Hegde,

Le 25/02/2026 à 14:33, Shrikanth Hegde a écrit :
> 
> 
> On 2/25/26 4:44 PM, Christophe Leroy (CS GROUP) wrote:
>> Hi Hegde,
>>
>> Le 25/02/2026 à 11:34, Shrikanth Hegde a écrit :
>>> Hi Christophe.
>>>
>>> On 2/25/26 3:15 PM, Christophe Leroy (CS GROUP) wrote:
>>>>
>>>> Hope it is more explicit now.
>>>>
>>>
>>> Got it. The main concern was around with additional computation that 
>>> sched_clock,
>>> not any additional paths per se.
>>>
>>> yes, that would be possible,
>>>
>>>
>>> How about we do below? This adds only one subtraction.
>>> This achieves the same outcome.
>>
>> It adds a bit more than just a substration. It adds a call to an 
>> extern fonction.
> 
> I think we should make it always inline and move it to time.h
> 
>>
>> 00000164 <my_account_cpu_user_entry>:
>>   164:    94 21 ff f0     stwu    r1,-16(r1)
>>   168:    7c 08 02 a6     mflr    r0
>>   16c:    90 01 00 14     stw     r0,20(r1)
>>   170:    93 e1 00 0c     stw     r31,12(r1)
>>   174:    7f ec 42 e6     mftb    r31
>>   178:    48 00 00 01     bl      178 <my_account_cpu_user_entry+0x14>
>>              178: R_PPC_REL24    get_boot_tb
>>   17c:    81 02 00 08     lwz     r8,8(r2)
>>   180:    81 22 00 28     lwz     r9,40(r2)
>>   184:    7c 84 f8 50     subf    r4,r4,r31
>>   188:    7d 29 40 50     subf    r9,r9,r8
>>   18c:    7d 29 22 14     add     r9,r9,r4
>>   190:    90 82 00 24     stw     r4,36(r2)
>>   194:    91 22 00 08     stw     r9,8(r2)
>>   198:    80 01 00 14     lwz     r0,20(r1)
>>   19c:    83 e1 00 0c     lwz     r31,12(r1)
>>   1a0:    7c 08 03 a6     mtlr    r0
>>   1a4:    38 21 00 10     addi    r1,r1,16
>>   1a8:    4e 80 00 20     blr
>>
>> 000001ac <my_account_cpu_user_exit>:
>>   1ac:    94 21 ff f0     stwu    r1,-16(r1)
>>   1b0:    7c 08 02 a6     mflr    r0
>>   1b4:    90 01 00 14     stw     r0,20(r1)
>>   1b8:    93 e1 00 0c     stw     r31,12(r1)
>>   1bc:    7f ec 42 e6     mftb    r31
>>   1c0:    48 00 00 01     bl      1c0 <my_account_cpu_user_exit+0x14>
>>              1c0: R_PPC_REL24    get_boot_tb
>>   1c4:    81 02 00 0c     lwz     r8,12(r2)
>>   1c8:    81 22 00 24     lwz     r9,36(r2)
>>   1cc:    7c 84 f8 50     subf    r4,r4,r31
>>   1d0:    7d 29 40 50     subf    r9,r9,r8
>>   1d4:    7d 29 22 14     add     r9,r9,r4
>>   1d8:    90 82 00 28     stw     r4,40(r2)
>>   1dc:    91 22 00 0c     stw     r9,12(r2)
>>   1e0:    80 01 00 14     lwz     r0,20(r1)
>>   1e4:    83 e1 00 0c     lwz     r31,12(r1)
>>   1e8:    7c 08 03 a6     mtlr    r0
>>   1ec:    38 21 00 10     addi    r1,r1,16
>>   1f0:    4e 80 00 20     blr
>>
>>
>> I really still can't see the point of this substraction.
>>
>> At one place we do
>>
>>      tb1 = mftb1;
>>
>>      acct->utime += (tb1 - acct->starttime_user);
>>      acct->starttime = tb1;
>>
>> At the other place we do
>>
>>      tb2 = mftb2;
>>
>>      acct->stime += (tb2 - acct->starttime);
>>      acct->starttime_user = tb2;
>>
>> So at the end we have
>>
>>      acct->utime += mftb1 - mftb2;
>>      acct->stime += mftb2 - mftb1;
>>
>> You want to change to
>>      tb1 = mftb1 - boot_tb;
>>      tb2 = mftb2 - boot_tb;
>>
>> At the end we would get
>>
>>      acct->utime += mftb1 - boot_tb - mftb2 + boot_tb = mftb1 - mftb2;
>>      acct->stime += mftb2 - boot_tb - mftb1 + boot_tb = mftb2 - mftb1;
>>
>> So what's the point in doing such a useless substract that disappears 
>> at the end ? What am I missing ?
>>
> 
> I had similar thought, but I saw this data below when i do exec on the 
> system.
> 
> This was the stats seen on PowerNV system with 144 CPUs.
> Nothing is running on the system after boot. So it is mostly idle.
> 
> 
> ======== With the series applied ===
> 
> cat /proc/stat | head
> cpu  1494 0 135607576 9628633227 16876 142 63 0 0 0
> cpu0 0 0 8 67807311 0 2 40 0 0 0
> cpu1 0 0 6 67807349 0 0 0 0 0 0
> 
> cat /proc/uptime
> 48.32 96286332.82   << Note this value is too huge. Also system value is 
> also huge.
> 
> ========= without the series(tip/master) ===============
> cat /proc/stat | head
> cpu  2003 0 67866261 859414 15923 249 66 0 0 0
> cpu0 5 0 23 5595 461 2 38 0 0 0
> cpu1 0 0 9 6092 21 0 3 0 0 0
> 
> cat /proc/uptime
> 61.29 8594.82    << This is right. 144*61 = 8784.
> 
> But note, the system time reported. i.e 67866261. It is too huge again. 
> And very close to actual mftb value
> rather than the diff. i.e we have paths were tb1 is not done. tb2 is 
> effectively mftb - 0
> 
> 
> ========= with proposed fix of mftb - boot_tb ===============
> cat /proc/stat | head
> cpu  5187 0 10996 2025690 16566 765 184 0 0 0
> cpu0 9 0 28 14096 65 6 108 0 0 0
> cpu1 4 0 15 14277 0 0 2 0 0 0
> 
> cat /proc/uptime
> 142.97 20257.42     << Looks correct, since 142*144 is close to 20448
> 
> =============================================================
> 
> Now lets go to CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
> 
> cat /proc/stat | head
> cpu  1804 0 3003 791760 15695 0 0 0 0 0
> cpu0 22 0 46 5535 0 0 0 0 0 0
> cpu1 0 0 7 5637 0 0 0 0 0 0
> 
> cat /proc/uptime
> 56.49 7918.05      << Looks correct. close 56*144
> 
> 
> ================================================
> 

I think I'm starting to understand now.

I think the problem is that acct->starttime has an invalid value the 
very first time it is used.

We are probably lacking an initial value in paca->accounting.starttime.
This should likely be initialised from mftb in head_64.S in 
start_here_common for main CPU and __secondary_start for other CPUs or 
maybe at higher level in C in setup_arch() and start_secondary()

Christophe


More information about the Linuxppc-dev mailing list