[PATCH v2 11/19] powerpc/rtas: add work area allocator

Nathan Lynch nathanl at linux.ibm.com
Thu Feb 9 01:48:01 AEDT 2023


Michael Ellerman <mpe at ellerman.id.au> writes:
> Nathan Lynch via B4 Submission Endpoint
> <devnull+nathanl.linux.ibm.com at kernel.org> writes:
>> diff --git a/arch/powerpc/include/asm/rtas-work-area.h b/arch/powerpc/include/asm/rtas-work-area.h
>> new file mode 100644
>> index 000000000000..76ccb039cc37
>> --- /dev/null
>> +++ b/arch/powerpc/include/asm/rtas-work-area.h
>> @@ -0,0 +1,45 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +#ifndef POWERPC_RTAS_WORK_AREA_H
>> +#define POWERPC_RTAS_WORK_AREA_H
>
> The usual style would be _ASM_POWERPC_RTAS_WORK_AREA_H.

OK. (will change in all new headers)

>> +static struct rtas_work_area_allocator_state {
>> +	struct gen_pool *gen_pool;
>> +	char *arena;
>> +	struct mutex mutex; /* serializes allocations */
>> +	struct wait_queue_head wqh;
>> +	mempool_t descriptor_pool;
>> +	bool available;
>> +} rwa_state_ = {
>> +	.mutex = __MUTEX_INITIALIZER(rwa_state_.mutex),
>> +	.wqh = __WAIT_QUEUE_HEAD_INITIALIZER(rwa_state_.wqh),
>> +};
>> +static struct rtas_work_area_allocator_state *rwa_state = &rwa_state_;
>
> I assumed the pointer was so you could swap this out at runtime or
> something, but I don't think you do.
>
> Any reason not to drop the pointer and just use rwa_state.foo accessors?
> That would also allow the struct to be anonymous.
>
> Or if you have the pointer you can at least make it NULL prior to init
> and avoid the need for "available".

I think it's there because earlier versions of this that I never posted
had unit tests. I'll either resurrect those or reduce the indirection.


>> +/*
>> + * A single work area buffer and descriptor to serve requests early in
>> + * boot before the allocator is fully initialized.
>> + */
>> +static bool early_work_area_in_use __initdata;
>> +static char early_work_area_buf[SZ_4K] __initdata;
>
> That should be page aligned I think?

Yes. It happens to be safe in this version because ibm,get-system-parameter,
which has no alignment requirement, is the only function used early
enough to use the buffer. But that's too fragile.


>> +static struct rtas_work_area early_work_area __initdata = {
>> +	.buf = early_work_area_buf,
>> +	.size = sizeof(early_work_area_buf),
>> +};
>> +
>> +
>> +static struct rtas_work_area * __init rtas_work_area_alloc_early(size_t size)
>> +{
>> +	WARN_ON(size > early_work_area.size);
>> +	WARN_ON(early_work_area_in_use);
>> +	early_work_area_in_use = true;
>> +	memset(early_work_area.buf, 0, early_work_area.size);
>> +	return &early_work_area;
>> +}
>> +
>> +static void __init rtas_work_area_free_early(struct rtas_work_area *work_area)
>> +{
>> +	WARN_ON(work_area != &early_work_area);
>> +	WARN_ON(!early_work_area_in_use);
>> +	early_work_area_in_use = false;
>> +}
>> +
>> +struct rtas_work_area * __ref rtas_work_area_alloc(size_t size)
>> +{
>> +	struct rtas_work_area *area;
>> +	unsigned long addr;
>> +
>> +	might_sleep();
>> +
>> +	WARN_ON(size > RTAS_WORK_AREA_MAX_ALLOC_SZ);
>> +	size = min_t(size_t, size, RTAS_WORK_AREA_MAX_ALLOC_SZ);
>
> This seems unsafe.
>
> If you return a buffer smaller than the caller asks for they're likely
> to read/write past the end of it and corrupt memory.

OK, let's figure out another way to handle this.

> AFAIK genalloc doesn't have guard pages or anything fancy to save us
> from that - but maybe I'm wrong, I've never used it.

Yeah we would have to build our own thing on top of it. And I don't
think it could be something that traps on access, it would have to be a
check in rtas_work_area_free(), after the fact.

> There's only three callers in the end, seems like we should just return
> NULL if the size is too large and have callers check the return value.

There are more conversions to do, and a property I hope to maintain is
that requests can't fail. Existing users of rtas_data_buf don't have
error paths for failure to acquire the buffer.

I believe the allocation size passed to rtas_work_area_alloc() can be
known at build time in all cases. Maybe we could prevent inappropriate
requests from being built with a compile-time assertion (untested):

/* rtas-work-area.h */

static inline struct rtas_work_area *rtas_work_area_alloc(size_t sz)
{
	static_assert(sz < RTAS_WORK_AREA_MAX_ALLOC_SZ);
        return __rtas_work_area_alloc(sz);
}

I think this would be OK? If I can't make it work I'll fall back to
returning NULL as you suggest, but it will make for more churn (and
risk) in the conversions.


>> +	if (!rwa_state->available) {
>> +		area = rtas_work_area_alloc_early(size);
>> +		goto out;
>> +	}
>> +
>> +	/*
>> +	 * To ensure FCFS behavior and prevent a high rate of smaller
>> +	 * requests from starving larger ones, use the mutex to queue
>> +	 * allocations.
>> +	 */
>> +	mutex_lock(&rwa_state->mutex);
>> +	wait_event(rwa_state->wqh,
>> +		   (addr = gen_pool_alloc(rwa_state->gen_pool, size)) != 0);
>> +	mutex_unlock(&rwa_state->mutex);
>> +
>> +	area = mempool_alloc(&rwa_state->descriptor_pool, GFP_KERNEL);
>> +	*area = (typeof(*area)){
>> +		.size = size,
>> +		.buf = (char *)addr,
>> +	};
>
> That is an odd way to write that :)

yeah I'll change it.

>
>> +out:
>> +	pr_devel("%ps -> %s() -> buf=%p size=%zu\n",
>> +		 (void *)_RET_IP_, __func__, area->buf, area->size);
>
> Can we drop those? They need a recompile to enable, so if someone needs
> debugging they can just rewrite them - or use some sort of tracing
> instead.

Sure.


>> +machine_arch_initcall(pseries, rtas_work_area_allocator_init);
>
> Should it live in platforms/pseries then?

Yeah it probably ought to. I am pretty sure the "work area" construct is
PAPR-specific, and I haven't found any evidence that it's used on
non-pseries.


>> +/**
>> + * rtas_work_area_reserve_arena() - reserve memory suitable for RTAS work areas.
>> + */
>> +int __init rtas_work_area_reserve_arena(const phys_addr_t limit)
>> +{
>> +	const phys_addr_t align = RTAS_WORK_AREA_ARENA_ALIGN;
>> +	const phys_addr_t size = RTAS_WORK_AREA_ARENA_SZ;
>> +	const phys_addr_t min = MEMBLOCK_LOW_LIMIT;
>> +	const int nid = NUMA_NO_NODE;
>
> This should probably also be restricted to pseries?

Yes.


More information about the Linuxppc-dev mailing list