[PATCH v2 1/3] module: Rename module_alloc() to text_alloc() and move to kernel proper

Tue Jul 14 23:47:50 AEST 2020

On Tue, 14 Jul 2020 at 16:33, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Tue, Jul 14, 2020 at 03:01:09PM +0200, Peter Zijlstra wrote:
> > On Tue, Jul 14, 2020 at 03:19:24PM +0300, Ard Biesheuvel wrote:
> > > So perhaps the answer is to have text_alloc() not with a 'where'
> > > argument but with a 'why' argument. Or more simply, just have separate
> > > alloc/free APIs for each case, with generic versions that can be
> > > overridden by the architecture.
> >
> > Well, there only seem to be 2 cases here, either the pointer needs to
> > fit in some immediate displacement, or not.
>
> On some arches you have a few choices for immediates depending on
> compiler options, e.g. on arm64:
>
> * +/- 128M with B
> * +/-4G with ADRP+ADD+BR
> * +/- 48/64 bits with a series of MOVK* + BR
>
> ... and you might build core kernel one way and modules another, and
> either could depend on configuration.
>
> > On x86 we seem have the advantage of a fairly large immediate
> > displacement as compared to many other architectures (due to our
> > variable sized instructions). And thus have been fairly liberal with our
> > usage of it (also our indirect jmps/calls suck, double so with
> > RETCH-POLINE).
> >
> > Still, the indirect jump, as mentioned by Russel should work for
> > arbitrarily placed code for us too.
> >
> >
> > So I'm thinking that something like:
> >
> > enum ptr_type {
> >       immediate_displacement,
> >       absolute,
> > };
> >
> > void *text_alloc(unsigned long size, enum ptr_type type)
> > {
> >       unsigned long vstart = VMALLOC_START;
> >       unsigned long vend   = VMALLOC_END;
> >
> >       if (type == immediate_displacement) {
> >               vstart = MODULES_VADDR;
> >               vend   = MODULES_END;
> >       }
> >
> >       return __vmalloc_node_range(size, TEXT_ALIGN, vstart, vend,
> >                                   GFP_KERNEL, PAGE_KERNEL_EXEC, 0,
> >                                   NUMA_NO_NODE, _RET_IP_);
> > }
> >
> > void text_free(void *ptr)
> > {
> >       vfree(ptr);
> > }
>
> I think it'd be easier to read with separate functions, e.g.
>
>   text_alloc_imm_offset(unsigned long size);
>   text_alloc_absolute(unsigned long size);
>

On arm64, we have a 128M window close to the core kernel for modules,
and a separate 128m window for bpf  programs, which are kept in
relative branching range of each other, but could be far away from
kernel+modules, and so having 'close' and 'far' as the only
distinction is insufficient.

> > Should work for all cases. Yes, we might then want something like a per
> > arch:
> >
> >       {BPF,FTRACE,KPROBE}_TEXT_TYPE
>
> ... at that point why not:
>
>   text_alloc_ftrace();
>   text_alloc_module();
>   text_alloc_bpf();
>   text_alloc_kprobe();
>
> ... etc which an arch can alias however it wants? e.g. x86 can have
> those all go to a common text_alloc_generic(), and that could even be a
> generic option for arches that don't care to distinguish these cases.
>

That is basically what i meant with separate alloc/free APIs, which i
think is the sanest approach here.

> Then if there are new places that want to allocate text we have to
> consider their requirements when adding them, too.
>
> Thanks,
> Mark.