[Cbe-oss-dev] SPE program loading through DMA

Yury Serdyuk Yury at serdyuk.botik.ru
Sat Oct 25 22:15:33 EST 2008


Hi !

I would like to consult about the following question.

I want to load SPE code to the local memory in some non-standard way -
without using of spe_program_load or spe_image_open functions.
It's needed due to my SPE code is emitted dynamically, for example,
by JIT-compiler ( particularly, given technique is used in CellDotNet 
package
http://code.google.com/p/celldotnet/).

Specifically, I try to load SPE code through DMA by spe_mfcio_get invocation
with target address equal 0.

The full test code ( test1_ppu.c) is the following:

> #include "libspe2.h"
> #include "malloc_align.h"
> #include "free_align.h"
> #include <pthread.h>
> #include <unistd.h>
>
> #define DMAINTENTRYS 32768
> #define K16 1024*16
>
> void push_dma (spe_context_ptr_t spe, unsigned int target, void* 
> source ,int size, int tag) {
>
>  while (size > K16) {
>   printf ( "target = %x  source = %x  size = %d  tag = %d\n", target, 
> source, size, tag );
>   spe_mfcio_get(spe,target,source,K16,tag,0,0);
>   size -=K16;   source += K16;   target += K16;
>  }
>  if (size > 0) {
>   spe_mfcio_get(spe,target,source,size,tag,0,0);
>  }
>
> }
>
> int main (int argc, char **argv) {
>
>  int* dmafield = (int*) _malloc_align (DMAINTENTRYS*sizeof(int)*2,4);
>  int* pointer = (int*) _malloc_align (16,4);
>  int i;
>
>  for (i = 0; i < DMAINTENTRYS;i++) {
>   dmafield[i] = i;
>   dmafield[i+DMAINTENTRYS]=i+3;
>  }
>
>  spe_context_ptr_t context;
>
>  if ((context = spe_context_create(0,NULL)) == NULL) {
>   perror("Failed creating SPE context");  exit(1);
>  }
>
>  /*loading the dma-field*/
>
>  push_dma(context, (int*) 0  , 
> (void*)dmafield,DMAINTENTRYS*sizeof(int),9);
>  printf ( "after push_dma ..." );
>
>  /*  spe_mfcio_tag_status_read(context,1<<3,SPE_TAG_ALL,NULL); */
>  spe_mfcio_tag_status_read(context,0,SPE_TAG_ALL,NULL);
>
>  _free_align (dmafield);
>  _free_align (pointer);
>  return 0;
>
> }

Corresponding  make file is here ( note that there is no spe-part at all ! )

> #!/bin/sh
>
> CELL_BIN="/usr/bin"
>
> # SDK 3:
> INC_PPU="-I/opt/cell/sdk/usr/include/"
>
> # remove previously compiled binary
> rm -f test1
>
> # compile PPE code
> echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c"
> ${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c
>
> ${CELL_BIN}/ppu-gcc -o test1 test1_ppu.o -lspe2

The problem is that given test works fine on PlayStation 3,
but doesn't work on QS22 blade server:

1) PlaySation 3:

>uname  -a
>>Linux ps3-gentoo 2.6.24-ps3 #1 SMP Wed Aug 13 00:36:09 JST 2008 ppc64
>>Cell Broadband Engine, altivec supported GNU/Linux
>
>cat /proc/cpuinfo
>>
>>processor       : 0
>>cpu             : Cell Broadband Engine, altivec supported
>>clock           : 3192.000000MHz
>>revision        : 16.0 (pvr 0070 1000)
>>
>>processor       : 1
>>cpu             : Cell Broadband Engine, altivec supported
>>clock           : 3192.000000MHz
>>revision        : 16.0 (pvr 0070 1000)
>>
>>timebase        : 79800000
>>platform        : PS3
>  
>
Cell SDK Version 3.0.0.0, lispe2 2.2

>/usr/include/libspe2-types.h
>/usr/include/libspe2.h
>/usr/lib/pkgconfig/libspe2.pc
>/usr/lib/libspe2.a
>/usr/lib/libspe2.so
>/usr/lib/libspe2.so.2.2.80
>/usr/lib/libspe2.so.2

Output of the test:

>@ps3-gentoo ~/Desktop/test1/yury/C_Test $ ./test1
>target = 0  source = f7f8a010  size = 131072  tag = 9
>target = 4000  source = f7f8e010  size = 114688  tag = 9
>target = 8000  source = f7f92010  size = 98304  tag = 9
>target = c000  source = f7f96010  size = 81920  tag = 9
>target = 10000  source = f7f9a010  size = 65536  tag = 9
>target = 14000  source = f7f9e010  size = 49152  tag = 9
>target = 18000  source = f7fa2010  size = 32768  tag = 9
>after push_dma ...


2) QS22:

> uname -a Linux cell8i-3 2.6.25-14.fc9.ppc64 #1 SMP Thu May 1 05:49:24 
> EDT 2008 ppc64 ppc64 ppc64 GNU/Linux 

> ]$ cat /proc/cpuinfo processor : 0 cpu : Cell Broadband Engine, 
> altivec supported clock : 3200.000000MHz revision : 48.0 (pvr 0070 
> 3000) processor : 1 cpu : Cell Broadband Engine, altivec supported 
> clock : 3200.000000MHz revision : 48.0 (pvr 0070 3000) processor : 2 
> cpu : Cell Broadband Engine, altivec supported clock : 3200.000000MHz 
> revision : 48.0 (pvr 0070 3000) processor : 3 cpu : Cell Broadband 
> Engine, altivec supported clock : 3200.000000MHz revision : 48.0 (pvr 
> 0070 3000) timebase : 26666666 platform : Cell machine : CHRP 
> IBM,0793-4RZ 

Cell SDK Version 3.0.0.0, libspe2 2.2

> /usr/lib64/libspe2.so /usr/lib64/libspe2.so.2 
> /usr/lib64/libspe2.so.2.2.0 /usr/lib64/trace/libspe2.so 
> /usr/lib64/trace/libspe2.so.2 /usr/lib64/trace/libspe2.so.2.2.0 
> /usr/lib64/trace/libspe2_.so 

Output of the test:

> ./test1 target = 0 source = 50020 size = 131072 tag = 9 

.... hanging on ...

Exploring the sources of libspe2, I have found that hanging on occurs
within issue_mfc_command function on write to mfc file:

> struct mfc_command_parameter_area parm = { .lsa = lsa, .ea = (unsigned 
> long) ea, .size = size, .tag = tag, .class = (tid << 8) | rid, .cmd = 
> cmd, }; printf ( "before write ...\n" ); ret = write(fd, &parm, sizeof 
> (parm)); // HANGING ON !!! printf ( "after write ...\n" );

So I have two questions:

1) what's the difference between PS3 and QS22 ( or corresponding Linux kernels)
which causes above problem ?

2) is it possible, in principle, to provide similar functionality in libspe2/Linux kernel
for QS22 ? In fact, it is very important for implementation of bytecode languages on Cell.

Thanks.

Yury










More information about the cbe-oss-dev mailing list