[PATCH 04/15] powerpc/time: Prepare to stop elapsing in dynticks-idle
Christophe Leroy (CS GROUP)
chleroy at kernel.org
Thu Feb 26 00:54:28 AEDT 2026
Hi Hegde,
Le 25/02/2026 à 14:33, Shrikanth Hegde a écrit :
>
>
> On 2/25/26 4:44 PM, Christophe Leroy (CS GROUP) wrote:
>> Hi Hegde,
>>
>> Le 25/02/2026 à 11:34, Shrikanth Hegde a écrit :
>>> Hi Christophe.
>>>
>>> On 2/25/26 3:15 PM, Christophe Leroy (CS GROUP) wrote:
>>>>
>>>> Hope it is more explicit now.
>>>>
>>>
>>> Got it. The main concern was around with additional computation that
>>> sched_clock,
>>> not any additional paths per se.
>>>
>>> yes, that would be possible,
>>>
>>>
>>> How about we do below? This adds only one subtraction.
>>> This achieves the same outcome.
>>
>> It adds a bit more than just a substration. It adds a call to an
>> extern fonction.
>
> I think we should make it always inline and move it to time.h
>
>>
>> 00000164 <my_account_cpu_user_entry>:
>> 164: 94 21 ff f0 stwu r1,-16(r1)
>> 168: 7c 08 02 a6 mflr r0
>> 16c: 90 01 00 14 stw r0,20(r1)
>> 170: 93 e1 00 0c stw r31,12(r1)
>> 174: 7f ec 42 e6 mftb r31
>> 178: 48 00 00 01 bl 178 <my_account_cpu_user_entry+0x14>
>> 178: R_PPC_REL24 get_boot_tb
>> 17c: 81 02 00 08 lwz r8,8(r2)
>> 180: 81 22 00 28 lwz r9,40(r2)
>> 184: 7c 84 f8 50 subf r4,r4,r31
>> 188: 7d 29 40 50 subf r9,r9,r8
>> 18c: 7d 29 22 14 add r9,r9,r4
>> 190: 90 82 00 24 stw r4,36(r2)
>> 194: 91 22 00 08 stw r9,8(r2)
>> 198: 80 01 00 14 lwz r0,20(r1)
>> 19c: 83 e1 00 0c lwz r31,12(r1)
>> 1a0: 7c 08 03 a6 mtlr r0
>> 1a4: 38 21 00 10 addi r1,r1,16
>> 1a8: 4e 80 00 20 blr
>>
>> 000001ac <my_account_cpu_user_exit>:
>> 1ac: 94 21 ff f0 stwu r1,-16(r1)
>> 1b0: 7c 08 02 a6 mflr r0
>> 1b4: 90 01 00 14 stw r0,20(r1)
>> 1b8: 93 e1 00 0c stw r31,12(r1)
>> 1bc: 7f ec 42 e6 mftb r31
>> 1c0: 48 00 00 01 bl 1c0 <my_account_cpu_user_exit+0x14>
>> 1c0: R_PPC_REL24 get_boot_tb
>> 1c4: 81 02 00 0c lwz r8,12(r2)
>> 1c8: 81 22 00 24 lwz r9,36(r2)
>> 1cc: 7c 84 f8 50 subf r4,r4,r31
>> 1d0: 7d 29 40 50 subf r9,r9,r8
>> 1d4: 7d 29 22 14 add r9,r9,r4
>> 1d8: 90 82 00 28 stw r4,40(r2)
>> 1dc: 91 22 00 0c stw r9,12(r2)
>> 1e0: 80 01 00 14 lwz r0,20(r1)
>> 1e4: 83 e1 00 0c lwz r31,12(r1)
>> 1e8: 7c 08 03 a6 mtlr r0
>> 1ec: 38 21 00 10 addi r1,r1,16
>> 1f0: 4e 80 00 20 blr
>>
>>
>> I really still can't see the point of this substraction.
>>
>> At one place we do
>>
>> tb1 = mftb1;
>>
>> acct->utime += (tb1 - acct->starttime_user);
>> acct->starttime = tb1;
>>
>> At the other place we do
>>
>> tb2 = mftb2;
>>
>> acct->stime += (tb2 - acct->starttime);
>> acct->starttime_user = tb2;
>>
>> So at the end we have
>>
>> acct->utime += mftb1 - mftb2;
>> acct->stime += mftb2 - mftb1;
>>
>> You want to change to
>> tb1 = mftb1 - boot_tb;
>> tb2 = mftb2 - boot_tb;
>>
>> At the end we would get
>>
>> acct->utime += mftb1 - boot_tb - mftb2 + boot_tb = mftb1 - mftb2;
>> acct->stime += mftb2 - boot_tb - mftb1 + boot_tb = mftb2 - mftb1;
>>
>> So what's the point in doing such a useless substract that disappears
>> at the end ? What am I missing ?
>>
>
> I had similar thought, but I saw this data below when i do exec on the
> system.
>
> This was the stats seen on PowerNV system with 144 CPUs.
> Nothing is running on the system after boot. So it is mostly idle.
>
>
> ======== With the series applied ===
>
> cat /proc/stat | head
> cpu 1494 0 135607576 9628633227 16876 142 63 0 0 0
> cpu0 0 0 8 67807311 0 2 40 0 0 0
> cpu1 0 0 6 67807349 0 0 0 0 0 0
>
> cat /proc/uptime
> 48.32 96286332.82 << Note this value is too huge. Also system value is
> also huge.
>
> ========= without the series(tip/master) ===============
> cat /proc/stat | head
> cpu 2003 0 67866261 859414 15923 249 66 0 0 0
> cpu0 5 0 23 5595 461 2 38 0 0 0
> cpu1 0 0 9 6092 21 0 3 0 0 0
>
> cat /proc/uptime
> 61.29 8594.82 << This is right. 144*61 = 8784.
>
> But note, the system time reported. i.e 67866261. It is too huge again.
> And very close to actual mftb value
> rather than the diff. i.e we have paths were tb1 is not done. tb2 is
> effectively mftb - 0
>
>
> ========= with proposed fix of mftb - boot_tb ===============
> cat /proc/stat | head
> cpu 5187 0 10996 2025690 16566 765 184 0 0 0
> cpu0 9 0 28 14096 65 6 108 0 0 0
> cpu1 4 0 15 14277 0 0 2 0 0 0
>
> cat /proc/uptime
> 142.97 20257.42 << Looks correct, since 142*144 is close to 20448
>
> =============================================================
>
> Now lets go to CONFIG_VIRT_CPU_ACCOUNTING_GEN=y
>
> cat /proc/stat | head
> cpu 1804 0 3003 791760 15695 0 0 0 0 0
> cpu0 22 0 46 5535 0 0 0 0 0 0
> cpu1 0 0 7 5637 0 0 0 0 0 0
>
> cat /proc/uptime
> 56.49 7918.05 << Looks correct. close 56*144
>
>
> ================================================
>
I think I'm starting to understand now.
I think the problem is that acct->starttime has an invalid value the
very first time it is used.
We are probably lacking an initial value in paca->accounting.starttime.
This should likely be initialised from mftb in head_64.S in
start_here_common for main CPU and __secondary_start for other CPUs or
maybe at higher level in C in setup_arch() and start_secondary()
Christophe
More information about the Linuxppc-dev
mailing list