[Cbe-oss-dev] [RFC 4/5] spufs: add kernel support for spu task

Sebastian Siewior bigeasy at linux.vnet.ibm.com
Wed Jun 13 19:17:01 EST 2007


Jeremy Kerr wrote:
> Hi Sebastian,
Jeremy,

>> Utilisation of SPUs by the kernel, main implementation.
>> The idea behind this implemtation is that there are single jobs that
>> are executed asynchronous on the SPU. The user queues jobs with
>> enqueue_for_spu() and gets a callback once the job is completed. The
>> function itself does not block, it returns -EINPROGRESS.
> 
> As Ben said, just return 0 instead.
ack,

> Also, I'm a little confused about the queue API - the user queues a 
> kspu_work_item with this function, and it later gets an 'enqueue' 
> callback, where it may put an entry on the ringbuffer, right?
> 
>> +config SPU_KERNEL_SUPPORT
>> +	tristate "Support for utilisation of SPU by the kernel"
>> +	default m
>> +	depends on SPU_FS
> 
> Maybe this should depend on CONFIG_EXPERIMENTAL?
Sure,


>> +choice
>> +	prompt "Slots in the ringbuffer"
>> +	depends on SPU_KERNEL_SUPPORT
>> +	default SPU_RINB_SLOTS_16
>> +	help
>> +	  This option allows you to determine the number of slots in the
>> +	  ringbuffer. More slots means that more work may be queued on
>> +	  the SPU before waiting for completion. The number of slots must
>> +	  be power of two. Every slot takes approx 128 bytes of memory.
> 
> Does this need to be a config option?

If we can spare, lets say, 16kb for the ring buffer we can make it static.
I noticed later on, that if I change it, the kernel gets recompiled but the
SPU code does not (what is a bug).


>>  SPU_CFLAGS	:= -O2 -Wall -I$(srctree)/include \
>> -		   -I$(objtree)/include2 -D__KERNEL__
>> +		   -I$(objtree)/include2 -D__KERNEL__ -ffreestanding
> 
> What happens without -ffreestanding?

The compiler will assume a default C library and assume that there will
be memcpy() and friends once we link (and not warn you of missing
declarations). Anyway, there is no libc around, so don't let him belive
this.


>> +
>> +struct kspu_code single_spu_code = {
>> +	.code = (unsigned char*) spu_kspu_code,
> 
> Would be good to keep the .code member the same type as the symbol here.
I could change it to unsigned int* since I need this only for memcpy().
What do you thing about:

echo "static const unsigned char $*_code[] " \
   "__attribute__((__aligned__(16))) = {" ; \
   xxd -i -c 10 < $< ; \
   echo "};" ; \

for arch/powerpc/platforms/cell/spufs/Makefile instead?. This will
affect the save/restore code as well. xxd is part of vim (I don't know
what the requirements are for builing spu code in kernel). *I* thing
char* is better than int* but it is up to you.

> 
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/cell/spufs/kspu_helper.c
>> @@ -0,0 +1,574 @@
>> +/*
>> + * Helper function for dealing with kspus
>> + */
> 
> I'd say that these aren't just 'helper' functions, they're the core API. 
> Might be better to call this arch/powerpc/platforms/cell/spusf/kspu.c
Okey.

>> +	if (!ret)
>> +		WARN_ON(1);
>> +	else {
>> +		kfree(kctx->notify_cb_info);
>> +		kfree(kctx);
>> +	}
> 
> Minor one, but keep the bracketing consistent within one if/else block.
okey.

>> +static void set_reg32(struct spu_lscsa *lscsa, int reg, unsigned int 
> p) +{
>> +	lscsa->gprs[reg].slot[0] = p;
>> +}
> 
> Would this be useful for other (non-KSPU) purposes? should we make this 
> a generic SPU function?

I don't think so. In my framework the user should not depend on any
registers. This is expenssive if the spu code is allready running.

>> +	 * |      آ�      |
> 
> I get some whacky character here that puts my mailer in right-to-left 
> text mode.
I move it to the dox folder with utf8

>> +#define BACKCHAIN (kctx->spu_code->data_offset - 16)
> 
> This makes me think that the data_offset member is misnamed, as it has 
> nothing to do with the 'data' section of the KSPU program.
Right.

>> +#define STACK_GAP 176
>> +#define INITIAL_STACK (BACKCHAIN - STACK_GAP)
>> +
>> +	/*
>> +	 * if kspu_data_offset is too large it will overflow not allocated
>> +	 * data will be modofied
>> +	 */
>> +	if (INITIAL_STACK > KSPU_LS_SIZE)
>> +		BUG();
> 
> BUG_ON ?
Looks like it, yes.


>> +	lscsa = ctx->csa.lscsa;
>> +	set_reg32(lscsa, 1, (unsigned int) (INITIAL_STACK));
> 
> we probably don't need the 'lscsa' variable for only one usage.
I used it more often in one of my earlier versions. Now it is only
required for
the stack so I probably get rid of set_reg32() as well.

>> +	spu_release(ctx);
>> +	printk("Stack ready 0x%x08\n", INITIAL_STACK);
> 
> 0x%08x? Also, if you're going to keep these printk()s in a later patch, 
> they'd probably be better as pr_debug()s.
okey.

>> +	ret = -EFBIG;
>> +	printk("SPU's queue: %d elemets, %d bytes each (%d bytes total)\n",
>> +			spu_code->queue_mask+1, spu_code->queue_entr_size,
>> +			(spu_code->queue_mask+1) * spu_code->queue_entr_size);
>> +	ring_space = (spu_code->queue_mask+1) * spu_code->queue_entr_size;
>> +	ring_space += sizeof(struct kspu_ring_data);
>> +	if (spu_code->data_offset != KSPU_LS_SIZE - ring_space) {
>> +		printk("data offset: %04x, ring space: %04x\n",
>> spu_code->data_offset, ring_space);
>> +		printk("data expect: %04x\n", KSPU_LS_SIZE - ring_space);
>> +	} 
>> +
>> +	if (spu_code->code_len > KSPU_LS_SIZE)
>> +		goto err;
> 
> Put the 'ret = -EFBIG' closer to the error check.
okey.


> 
>> +	kctx->notify_cb_info = kzalloc(sizeof(void*) *
> 
>               sizeof(*kctx->notify_cb_info) ----^   ?
ack

>> +
>> +	do {
>> +		ret = spufs_run_spu(ctx, npc, event);
>> +		if (ret & SPU_STATUS_STOPPED_BY_STOP) {
>> +			if (ret >> SPU_STOP_STATUS_SHIFT == 0x1234) {
> 
> We'll need to formalise this stopcode somehow, a #define should do the 
> trick.
This is part of the sync interface, which should not be used in
productive code. This was intended to have a printk() like function on
the SPU side for debugging reasons. I was never desperate enough to
finish it :) If we do want to keep it than yes. We don't need printks in
the async part, do we? I also don't think that we need syscalls in
kernel code at all. So this will end up (probably) with printk only.

>> +
>> +	printk("Entering main loop\n");
>> +
>> +	do {
>> +		fastpath = 0;
> 
> What conditions can we take the fast path? can we rename the variable to 
> reflect this?

We run through the code, do notify(), queue(). If one of those two
functions
did actually something (callbacks for notify or queue were called)
we take the fast path and skip realease ctx mutex + schedule() + get mutex.

>> +static struct kspu_context *kspu_ctx;
> 
> As ben mentioned, we could have one per physical SPE, or even make this 
> configurable? or create contexts as required?
As mentioned to ben, I don't know (right now) how many physical SPUs are
available (in terms of total and not too busy with user space).
I don't want steal any SPUs from user space that's why I try to glue
code together and try to avoid context switches.

Create as required sounds good to me. This involves some enhancements to
the scheduler I guess.
When you create ctx on demand you have to consider one little thing from
kspu side (atleast it is a requirement by AES):
You have one crypto user that does AES-CBC.That user enqueues two
encryption requests. You can't put request one on SPU A ond request two
on SPU B because request two depends on the result of one. It is not a
problem with AES in ECB mode. You could split on two SPUs if you have
two crypto users.

> 
>> Index: b/arch/powerpc/platforms/cell/spufs/spu_main.c
>> ===================================================================
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/cell/spufs/spu_main.c
>> @@ -0,0 +1,35 @@
>> +/*
>> + * This code can be considered as crt0.S
>> + * Compile with -O[123S] and make sure that here is only one
>> function + * that starts at 0x0
>> + */
>> +#include <spu_mfcio.h>
>> +
>> +#include <asm/kspu/merged_code.h>
>> +
>> +spu_operation spu_ops[TOTAL_SPU_FUNCS] __attribute__((aligned(16)))
> 
> ops vs. funcs ?
func is the right one.


>> = { +
>> +};
>> +
>> +#define DMA_TAG 1
>> +void _start(void) __attribute__((noreturn));
>> +void _start(void)
>> +{
>> +	struct kernel_spu_data *spu_data;
>> +
>> +	mfc_write_tag_mask(1<< DMA_TAG);
> 
> Why do you set the tag here ?

Left oever from the past. I didn't get double buffer to work. Once it
does, it should disappear.

> 
>> +	spu_data = (struct kernel_spu_data*) KERNEL_SPU_DATA_OFFSET;
>> +
>> +	while (42) {
>> +
>> +		unsigned int consumed = spu_data->kspu_ring_data.consumed;
>> +		unsigned int outstanding = spu_data->kspu_ring_data.outstanding;
>> +
>> +		if (consumed != outstanding) {
>> +			int cons = consumed & (RB_SLOTS-1);
>> +
>> +			spu_ops[spu_data->work_item[cons].operation](cons);
>> +		} else
>> +			spu_stop(0x99);
> 
> We should probably #define this.
The stop is not important right now. I has just to stop.

> 
>> Index: b/include/asm-powerpc/kspu/merged_code.h
>> ===================================================================
>> --- /dev/null
>> +++ b/include/asm-powerpc/kspu/merged_code.h
>> @@ -0,0 +1,46 @@
>> +#ifndef KSPU_MERGED_CODE_H
>> +#define KSPU_MERGED_CODE_H
>> +#include <linux/autoconf.h>
>> +
>> +#define KSPU_LS_SIZE 0x40000
> 
> This is defined in spu.h:
> 
> include/asm-powerpc/spu.h:#define LS_SIZE (256 * 1024)
> 
> Can we somehow share this with the SPE-application header?

We got first fix this:

In file included from arch/powerpc/platforms/cell/spufs/spu_main.c:9:
/home/bigeasy/kernel/ps3-linux/include/asm/spu.h:74:1: warning:
"MFC_SYNC_CMD" redefined
In file included from arch/powerpc/platforms/cell/spufs/spu_main.c:6:
/usr/lib/gcc/spu/4.1.1/include/spu_mfcio.h:121:1: warning: this is the
location of the previous definition
In file included from /home/bigeasy/kernel/ps3-linux/include/asm/mmu.h:8,
                 from
/home/bigeasy/kernel/ps3-linux/include/asm/lppaca.h:33,
                 from /home/bigeasy/kernel/ps3-linux/include/asm/paca.h:21,
                 from
/home/bigeasy/kernel/ps3-linux/include/asm/hw_irq.h:18,
                 from
/home/bigeasy/kernel/ps3-linux/include/asm/system.h:10,
                 from
/home/bigeasy/kernel/ps3-linux/include/linux/list.h:10,
                 from
/home/bigeasy/kernel/ps3-linux/include/linux/timer.h:5,
                 from
/home/bigeasy/kernel/ps3-linux/include/linux/workqueue.h:9,
                 from /home/bigeasy/kernel/ps3-linux/include/asm/spu.h:28,
                 from arch/powerpc/platforms/cell/spufs/spu_main.c:10:
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h: In function
'hpte_encode_r':
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:216: warning:
integer constant is too large for 'unsigned long' type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h: In function
'hpt_hash':
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:231: warning:
integer constant is too large for 'unsigned long' type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h: In function
'vsid_scramble':
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:387: warning:
right shift count >= width of type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:387: warning:
left shift count >= width of type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:388: warning:
right shift count >= width of type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:388: warning:
left shift count >= width of type
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h: In function
'get_kernel_vsid':
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:395: error:
'SID_SHIFT' undeclared (first use in this function)
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:395: error:
(Each undeclared identifier is reported only once
/home/bigeasy/kernel/ps3-linux/include/asm/mmu-hash64.h:395: error: for
each function it appears in.)

I think thats the reason why the spu save/restore also defines it.

> 
> Cheers,
cheers,

> 
> 
> Jeremy
> 
Sebastian



More information about the cbe-oss-dev mailing list