[PATCH v2 11/19] powerpc/rtas: add work area allocator
Nathan Lynch
nathanl at linux.ibm.com
Thu Feb 9 01:48:01 AEDT 2023
Michael Ellerman <mpe at ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com at kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.
OK. (will change in all new headers)
>> +static struct rtas_work_area_allocator_state {
>> + struct gen_pool *gen_pool;
>> + char *arena;
>> + struct mutex mutex; /* serializes allocations */
>> + struct wait_queue_head wqh;
>> + mempool_t descriptor_pool;
>> + bool available;
>> +} rwa_state_ = {
>> + .mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> + .wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".
I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.
>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?
Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.
>> +static struct rtas_work_area early_work_area __initdata = {
>> + .buf = early_work_area_buf,
>> + .size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
>> +{
>> + WARN_ON(size > early_work_area.size);
>> + WARN_ON(early_work_area_in_use);
>> + early_work_area_in_use = true;
>> + memset(early_work_area.buf, 0, early_work_area.size);
>> + return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
>> +{
>> + WARN_ON(work_area != &early_work_area);
>> + WARN_ON(!early_work_area_in_use);
>> + early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> + struct rtas_work_area *area;
>> + unsigned long addr;
>> +
>> + might_sleep();
>> +
>> + WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> + size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.
OK, let's figure out another way to handle this.
> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.
Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.
> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.
There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.
I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):
/* rtas-work-area.h */
static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
return __rtas_work_area_alloc(sz);
}
I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.
>> + if (!rwa_state->available) {
>> + area = rtas_work_area_alloc_early(size);
>> + goto out;
>> + }
>> +
>> + /*
>> + * To ensure FCFS behavior and prevent a high rate of smaller
>> + * requests from starving larger ones, use the mutex to queue
>> + * allocations.
>> + */
>> + mutex_lock(&rwa_state->mutex);
>> + wait_event(rwa_state->wqh,
>> + (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> + mutex_unlock(&rwa_state->mutex);
>> +
>> + area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> + *area = (typeof(*area)){
>> + .size = size,
>> + .buf = (char *)addr,
>> + };
>
> That is an odd way to write that :)
yeah I'll change it.
>
>> +out:
>> + pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> + (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.
Sure.
>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?
Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.
>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> + const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> + const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> + const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> + const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?
Yes.
More information about the Linuxppc-dev
mailing list