[Cbe-oss-dev] SPE program loading through DMA

Andrew Friedley afriedle at indiana.edu
Thu Oct 30 00:01:11 EST 2008


I tried a while ago to do the same thing you're doing -- except I saw a 
similar hang on the PS3.  I never figured out why.

My solution instead was to memory map the SPU's local store and just 
memcpy() the code over before starting the SPU.  My understanding is 
that this won't be as fast as using DMA, but it works.

Andrew


Yury Serdyuk wrote:
> Hi !
> 
> I would like to consult about the following question.
> 
> I want to load SPE code to the local memory in some non-standard way -
> without using of spe_program_load or spe_image_open functions.
> It's needed due to my SPE code is emitted dynamically, for example,
> by JIT-compiler ( particularly, given technique is used in CellDotNet 
> package
> http://code.google.com/p/celldotnet/).
> 
> Specifically, I try to load SPE code through DMA by spe_mfcio_get 
> invocation
> with target address equal 0.
> 
> The full test code ( test1_ppu.c) is the following:
> 
>> #include "libspe2.h"
>> #include "malloc_align.h"
>> #include "free_align.h"
>> #include <pthread.h>
>> #include <unistd.h>
>>
>> #define DMAINTENTRYS 32768
>> #define K16 1024*16
>>
>> void push_dma (spe_context_ptr_t spe, unsigned int target, void* 
>> source ,int size, int tag) {
>>
>>  while (size > K16) {
>>   printf ( "target = %x  source = %x  size = %d  tag = %d\n", target, 
>> source, size, tag );
>>   spe_mfcio_get(spe,target,source,K16,tag,0,0);
>>   size -=K16;   source += K16;   target += K16;
>>  }
>>  if (size > 0) {
>>   spe_mfcio_get(spe,target,source,size,tag,0,0);
>>  }
>>
>> }
>>
>> int main (int argc, char **argv) {
>>
>>  int* dmafield = (int*) _malloc_align (DMAINTENTRYS*sizeof(int)*2,4);
>>  int* pointer = (int*) _malloc_align (16,4);
>>  int i;
>>
>>  for (i = 0; i < DMAINTENTRYS;i++) {
>>   dmafield[i] = i;
>>   dmafield[i+DMAINTENTRYS]=i+3;
>>  }
>>
>>  spe_context_ptr_t context;
>>
>>  if ((context = spe_context_create(0,NULL)) == NULL) {
>>   perror("Failed creating SPE context");  exit(1);
>>  }
>>
>>  /*loading the dma-field*/
>>
>>  push_dma(context, (int*) 0  , 
>> (void*)dmafield,DMAINTENTRYS*sizeof(int),9);
>>  printf ( "after push_dma ..." );
>>
>>  /*  spe_mfcio_tag_status_read(context,1<<3,SPE_TAG_ALL,NULL); */
>>  spe_mfcio_tag_status_read(context,0,SPE_TAG_ALL,NULL);
>>
>>  _free_align (dmafield);
>>  _free_align (pointer);
>>  return 0;
>>
>> }
> 
> Corresponding  make file is here ( note that there is no spe-part at all 
> ! )
> 
>> #!/bin/sh
>>
>> CELL_BIN="/usr/bin"
>>
>> # SDK 3:
>> INC_PPU="-I/opt/cell/sdk/usr/include/"
>>
>> # remove previously compiled binary
>> rm -f test1
>>
>> # compile PPE code
>> echo "${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c"
>> ${CELL_BIN}/ppu-gcc -W -Wall -O3 ${INC_PPU} -c test1_ppu.c
>>
>> ${CELL_BIN}/ppu-gcc -o test1 test1_ppu.o -lspe2
> 
> The problem is that given test works fine on PlayStation 3,
> but doesn't work on QS22 blade server:
> 
> 1) PlaySation 3:
> 
>> uname  -a
>>> Linux ps3-gentoo 2.6.24-ps3 #1 SMP Wed Aug 13 00:36:09 JST 2008 ppc64
>>> Cell Broadband Engine, altivec supported GNU/Linux
>>
>> cat /proc/cpuinfo
>>>
>>> processor       : 0
>>> cpu             : Cell Broadband Engine, altivec supported
>>> clock           : 3192.000000MHz
>>> revision        : 16.0 (pvr 0070 1000)
>>>
>>> processor       : 1
>>> cpu             : Cell Broadband Engine, altivec supported
>>> clock           : 3192.000000MHz
>>> revision        : 16.0 (pvr 0070 1000)
>>>
>>> timebase        : 79800000
>>> platform        : PS3
>>  
>>
> Cell SDK Version 3.0.0.0, lispe2 2.2
> 
>> /usr/include/libspe2-types.h
>> /usr/include/libspe2.h
>> /usr/lib/pkgconfig/libspe2.pc
>> /usr/lib/libspe2.a
>> /usr/lib/libspe2.so
>> /usr/lib/libspe2.so.2.2.80
>> /usr/lib/libspe2.so.2
> 
> Output of the test:
> 
>> @ps3-gentoo ~/Desktop/test1/yury/C_Test $ ./test1
>> target = 0  source = f7f8a010  size = 131072  tag = 9
>> target = 4000  source = f7f8e010  size = 114688  tag = 9
>> target = 8000  source = f7f92010  size = 98304  tag = 9
>> target = c000  source = f7f96010  size = 81920  tag = 9
>> target = 10000  source = f7f9a010  size = 65536  tag = 9
>> target = 14000  source = f7f9e010  size = 49152  tag = 9
>> target = 18000  source = f7fa2010  size = 32768  tag = 9
>> after push_dma ...
> 
> 
> 2) QS22:
> 
>> uname -a Linux cell8i-3 2.6.25-14.fc9.ppc64 #1 SMP Thu May 1 05:49:24 
>> EDT 2008 ppc64 ppc64 ppc64 GNU/Linux 
> 
>> ]$ cat /proc/cpuinfo processor : 0 cpu : Cell Broadband Engine, 
>> altivec supported clock : 3200.000000MHz revision : 48.0 (pvr 0070 
>> 3000) processor : 1 cpu : Cell Broadband Engine, altivec supported 
>> clock : 3200.000000MHz revision : 48.0 (pvr 0070 3000) processor : 2 
>> cpu : Cell Broadband Engine, altivec supported clock : 3200.000000MHz 
>> revision : 48.0 (pvr 0070 3000) processor : 3 cpu : Cell Broadband 
>> Engine, altivec supported clock : 3200.000000MHz revision : 48.0 (pvr 
>> 0070 3000) timebase : 26666666 platform : Cell machine : CHRP 
>> IBM,0793-4RZ 
> 
> Cell SDK Version 3.0.0.0, libspe2 2.2
> 
>> /usr/lib64/libspe2.so /usr/lib64/libspe2.so.2 
>> /usr/lib64/libspe2.so.2.2.0 /usr/lib64/trace/libspe2.so 
>> /usr/lib64/trace/libspe2.so.2 /usr/lib64/trace/libspe2.so.2.2.0 
>> /usr/lib64/trace/libspe2_.so 
> 
> Output of the test:
> 
>> ./test1 target = 0 source = 50020 size = 131072 tag = 9 
> 
> .... hanging on ...
> 
> Exploring the sources of libspe2, I have found that hanging on occurs
> within issue_mfc_command function on write to mfc file:
> 
>> struct mfc_command_parameter_area parm = { .lsa = lsa, .ea = (unsigned 
>> long) ea, .size = size, .tag = tag, .class = (tid << 8) | rid, .cmd = 
>> cmd, }; printf ( "before write ...\n" ); ret = write(fd, &parm, sizeof 
>> (parm)); // HANGING ON !!! printf ( "after write ...\n" );
> 
> So I have two questions:
> 
> 1) what's the difference between PS3 and QS22 ( or corresponding Linux 
> kernels)
> which causes above problem ?
> 
> 2) is it possible, in principle, to provide similar functionality in 
> libspe2/Linux kernel
> for QS22 ? In fact, it is very important for implementation of bytecode 
> languages on Cell.
> 
> Thanks.
> 
> Yury
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev at ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev



More information about the cbe-oss-dev mailing list