[PATCH v3 2/3] prandom: Convert prandom_u32_state() to __always_inline
Ryan Roberts
ryan.roberts at arm.com
Sat Jan 3 01:09:26 AEDT 2026
On 02/01/2026 13:39, Jason A. Donenfeld wrote:
> Hi Ryan,
>
> On Fri, Jan 2, 2026 at 2:12 PM Ryan Roberts <ryan.roberts at arm.com> wrote:
>> context. Given the function is just a handful of operations and doesn't
>
> How many? What's this looking like in terms of assembly?
25 instructions on arm64:
0000000000000000 <prandom_u32_state>:
0: 29401403 ldp w3, w5, [x0]
4: aa0003e1 mov x1, x0
8: 29410002 ldp w2, w0, [x0, #8]
c: 531e74a4 lsl w4, w5, #2
10: 530e3468 lsl w8, w3, #18
14: 4a0400a5 eor w5, w5, w4
18: 4a031863 eor w3, w3, w3, lsl #6
1c: 53196047 lsl w7, w2, #7
20: 53134806 lsl w6, w0, #13
24: 4a023442 eor w2, w2, w2, lsl #13
28: 4a000c00 eor w0, w0, w0, lsl #3
2c: 121b6884 and w4, w4, #0xffffffe0
30: 120d3108 and w8, w8, #0xfff80000
34: 121550e7 and w7, w7, #0xfffff800
38: 120c2cc6 and w6, w6, #0xfff00000
3c: 2a456c85 orr w5, w4, w5, lsr #27
40: 2a433504 orr w4, w8, w3, lsr #13
44: 2a4254e3 orr w3, w7, w2, lsr #21
48: 2a4030c2 orr w2, w6, w0, lsr #12
4c: 4a020066 eor w6, w3, w2
50: 4a050080 eor w0, w4, w5
54: 4a0000c0 eor w0, w6, w0
58: 29001424 stp w4, w5, [x1]
5c: 29010823 stp w3, w2, [x1, #8]
60: d65f03c0 ret
> It'd also be
> nice to have some brief analysis of other call sites to have
> confirmation this isn't blowing up other users.
I compiled defconfig before and after this patch on arm64 and compared the text
sizes:
$ ./scripts/bloat-o-meter -t vmlinux.before vmlinux.after
add/remove: 3/4 grow/shrink: 4/1 up/down: 836/-128 (708)
Function old new delta
prandom_seed_full_state 364 932 +568
pick_next_task_fair 1940 2036 +96
bpf_user_rnd_u32 104 196 +92
prandom_bytes_state 204 260 +56
e843419 at 0f2b_00012d69_e34 - 8 +8
e843419 at 0db7_00010ec3_23ec - 8 +8
e843419 at 02cb_00003767_25c - 8 +8
bpf_prog_select_runtime 448 444 -4
e843419 at 0aa3_0000cfd1_1580 8 - -8
e843419 at 0aa2_0000cfba_147c 8 - -8
e843419 at 075f_00008d8c_184 8 - -8
prandom_u32_state 100 - -100
Total: Before=19078072, After=19078780, chg +0.00%
So 708 bytes more after inlining. The main cost is prandom_seed_full_state(),
which calls prandom_u32_state() 10 times (via prandom_warmup()). I expect we
could turn that into a loop to reduce ~450 bytes overall.
I'm not really sure if 708 is good or bad...
>
>> +static __always_inline u32 prandom_u32_state(struct rnd_state *state)
>
> Why not just normal `inline`? Is gcc disagreeing with the inlinability
> of this function?
Given this needs to be called from a noinstr function, I didn't want to give the
compiler the opportunity to decide not to inline it, since in that case, some
instrumentation might end up being applied to the function body which would blow
up when called in the noinstr context.
I think the other 2 options are to keep prandom_u32_state() in the c file but
mark it noinstr or rearrange all the users so that thay don't call it until
instrumentation is allowable. The latter is something I was trying to avoid.
There is some previous discussion of this at [1].
[1] https://lore.kernel.org/all/aS65LFUfdgRPKv1l@J2N7QTR9R3/
Perhaps keeping prandom_u32_state() in the c file and making it noinstr is the
best compromise?
Thanks,
Ryan
>
> Jason
More information about the Linuxppc-dev
mailing list