[Cbe-oss-dev] SPE program loading through DMA
Yury Serdyuk
Yury at serdyuk.botik.ru
Sat Oct 25 22:15:33 EST 2008
Hi !
I would like to consult about the following question.
I want to load SPE code to the local memory in some non-standard way -
without using of spe_program_load or spe_image_open functions.
It's needed due to my SPE code is emitted dynamically, for example,
by JIT-compiler ( particularly, given technique is used in CellDotNet
package
http://code.google.com/p/celldotnet/).
Specifically, I try to load SPE code through DMA by spe_mfcio_get invocation
with target address equal 0.
The full test code ( test1_ppu.c) is the following:
> #include "libspe2.h"
> #include "malloc_align.h"
> #include "free_align.h"
> #include <pthread.h>
> #include <unistd.h>
>
> #define DMAINTENTRYS 32768
> #define K16 1024*16
>
> void push_dma (spe_context_ptr_t spe, unsigned int target, void*
> source ,int size, int tag) {
>
> while (size > K16) {
> printf ( "target = %x source = %x size = %d tag = %d\n", target,
> source, size, tag );
> spe_mfcio_get(spe,target,source,K16,tag,0,0);
> size -=K16; source += K16; target += K16;
> }
> if (size > 0) {
> spe_mfcio_get(spe,target,source,size,tag,0,0);
> }
>
> }
>
> int main (int argc, char **argv) {
>
> int* dmafield = (int*) _malloc_align (DMAINTENTRYS*sizeof(int)*2,4);
> int* pointer = (int*) _malloc_align (16,4);
> int i;
>
> for (i = 0; i < DMAINTENTRYS;i++) {
> dmafield[i] = i;
> dmafield[i+DMAINTENTRYS]=i+3;
> }
>
> spe_context_ptr_t context;
>
> if ((context = spe_context_create(0,NULL)) == NULL) {
> perror("Failed creating SPE context"); exit(1);
> }
>
> /*loading the dma-field*/
>
> push_dma(context, (int*) 0 ,
> (void*)dmafield,DMAINTENTRYS*sizeof(int),9);
> printf ( "after push_dma ..." );
>
> /* spe_mfcio_tag_status_read(context,1<<3,SPE_TAG_ALL,NULL); */
> spe_mfcio_tag_status_read(context,0,SPE_TAG_ALL,NULL);
>
> _free_align (dmafield);
> _free_align (pointer);
> return 0;
>
> }
Corresponding make file is here ( note that there is no spe-part at all ! )
> #!/bin/sh
>
> CELL_BIN="/usr/bin"
>
> # SDK 3:
> INC_PPU="-I/opt/cell/sdk/usr/include/"
>
> # remove previously compiled binary
> rm -f test1
>
> # compile PPE code
> echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c"
> ${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c
>
> ${CELL_BIN}/ppu-gcc -o test1 test1_ppu.o -lspe2
The problem is that given test works fine on PlayStation 3,
but doesn't work on QS22 blade server:
1) PlaySation 3:
>uname -a
>>Linux ps3-gentoo 2.6.24-ps3 #1 SMP Wed Aug 13 00:36:09 JST 2008 ppc64
>>Cell Broadband Engine, altivec supported GNU/Linux
>
>cat /proc/cpuinfo
>>
>>processor : 0
>>cpu : Cell Broadband Engine, altivec supported
>>clock : 3192.000000MHz
>>revision : 16.0 (pvr 0070 1000)
>>
>>processor : 1
>>cpu : Cell Broadband Engine, altivec supported
>>clock : 3192.000000MHz
>>revision : 16.0 (pvr 0070 1000)
>>
>>timebase : 79800000
>>platform : PS3
>
>
Cell SDK Version 3.0.0.0, lispe2 2.2
>/usr/include/libspe2-types.h
>/usr/include/libspe2.h
>/usr/lib/pkgconfig/libspe2.pc
>/usr/lib/libspe2.a
>/usr/lib/libspe2.so
>/usr/lib/libspe2.so.2.2.80
>/usr/lib/libspe2.so.2
Output of the test:
>@ps3-gentoo ~/Desktop/test1/yury/C_Test $ ./test1
>target = 0 source = f7f8a010 size = 131072 tag = 9
>target = 4000 source = f7f8e010 size = 114688 tag = 9
>target = 8000 source = f7f92010 size = 98304 tag = 9
>target = c000 source = f7f96010 size = 81920 tag = 9
>target = 10000 source = f7f9a010 size = 65536 tag = 9
>target = 14000 source = f7f9e010 size = 49152 tag = 9
>target = 18000 source = f7fa2010 size = 32768 tag = 9
>after push_dma ...
2) QS22:
> uname -a Linux cell8i-3 2.6.25-14.fc9.ppc64 #1 SMP Thu May 1 05:49:24
> EDT 2008 ppc64 ppc64 ppc64 GNU/Linux
> ]$ cat /proc/cpuinfo processor : 0 cpu : Cell Broadband Engine,
> altivec supported clock : 3200.000000MHz revision : 48.0 (pvr 0070
> 3000) processor : 1 cpu : Cell Broadband Engine, altivec supported
> clock : 3200.000000MHz revision : 48.0 (pvr 0070 3000) processor : 2
> cpu : Cell Broadband Engine, altivec supported clock : 3200.000000MHz
> revision : 48.0 (pvr 0070 3000) processor : 3 cpu : Cell Broadband
> Engine, altivec supported clock : 3200.000000MHz revision : 48.0 (pvr
> 0070 3000) timebase : 26666666 platform : Cell machine : CHRP
> IBM,0793-4RZ
Cell SDK Version 3.0.0.0, libspe2 2.2
> /usr/lib64/libspe2.so /usr/lib64/libspe2.so.2
> /usr/lib64/libspe2.so.2.2.0 /usr/lib64/trace/libspe2.so
> /usr/lib64/trace/libspe2.so.2 /usr/lib64/trace/libspe2.so.2.2.0
> /usr/lib64/trace/libspe2_.so
Output of the test:
> ./test1 target = 0 source = 50020 size = 131072 tag = 9
.... hanging on ...
Exploring the sources of libspe2, I have found that hanging on occurs
within issue_mfc_command function on write to mfc file:
> struct mfc_command_parameter_area parm = { .lsa = lsa, .ea = (unsigned
> long) ea, .size = size, .tag = tag, .class = (tid << 8) | rid, .cmd =
> cmd, }; printf ( "before write ...\n" ); ret = write(fd, &parm, sizeof
> (parm)); // HANGING ON !!! printf ( "after write ...\n" );
So I have two questions:
1) what's the difference between PS3 and QS22 ( or corresponding Linux kernels)
which causes above problem ?
2) is it possible, in principle, to provide similar functionality in libspe2/Linux kernel
for QS22 ? In fact, it is very important for implementation of bytecode languages on Cell.
Thanks.
Yury
More information about the cbe-oss-dev
mailing list