[Cbe-oss-dev] Simple work-around found for dmabench with up to 5 SPE's on Fedora 7 (PS3) and 2.6.23-rc4 kernel

Goffredo Marocchi panajev at gmail.com
Sat Sep 1 03:07:35 EST 2007


>From dmabench's README.txt:
          --numspes n - specifies that n SPEs should
concurrently execute
               the benchmark.  The default is to execute the benchmark on a
               single SPE. When the benchmark is executed on more than one
               SPE, the SPEs are synchronized so that the benchmark code
               starts at roughly the same time on all SPEs.

This paragraph is also telling you basically not to try to set n > 6
on a PLAYSTATION 3 (CELL BE in PS3 has the 8th SPE disabled for
redundancy reasons, the 7th SPE is not accessible by the guest OS).

What's the fix ? Another debugging related environment variable ?

No, a simple change in the dmabench.c source file.

38 #define MAX_SPES        6

Originally the value read 16 which would be the maximum you could
target with a dual CELL BE solution like you find in IBM's CELL
blades.

The number of SPU threads, with libspe2 and posix threads, is not
limited by the number of physical SPU's and what the dmabench program
does check to make sure that the number read through the --numspes
option is valid: taking the minimum between that value and the
MAX_SPES value.

   224         case 'n':            /* numspes */
   225             num_spes = atoi(optarg);
   226             num_spes = MIN(num_spes, MAX_SPES);
   227             break;

MAX_SPES is used to set-up enough available memory and SPU's context
to allow the program to run even if you would want to test all 16
SPE's in your system (if you had such a system of course).

No matter if you set --numspes to 1, 2, 3, 4, etc... the program will
allocate memory based on the value specified by MAX_SPES:

    80 /* Allocate space for parameters for each SPU */
    81 static dmabench_parms parms[MAX_SPES] __attribute__ ((aligned (16)));
    82
    83 /* Cache-line sized block for use in barrier calls. */
    84 static unsigned int bar[CACHE_LINE_SIZE/sizeof(unsigned int)]
__attribu        te__ ((aligned (128)));
    85
    86 /* Buffers in main memory that will be read or written by DMAs */
    87 static uint64_t tgt_buf[MAX_SPES][NUM_ITER*NREQS*BUFSIZE]
__attribute__         ((aligned (4096)));

[...]

   181 int main(int argc, char *argv[])
   182 {
   183     spe_gang_context_ptr_t gang = NULL;
   184     spe_context_ptr_t ctx[MAX_SPES];
   185     void *ls[MAX_SPES];
   186     pthread_t thread[MAX_SPES];

[...]

If you can run it with a single SPE, you have enough memory to run
this program with 5-6 SPE's. We are not THAT low on memory on PS3.

I am still not clear (read: have no clue really) why 4 SPE's is the
upper limit before the program hangs the system and just using 5 SPE's
instead does the "annoying" trick and why this error goes away when we
reduce the MAX_SPES value to 6: all the memory we allocate is
statically allocated on the stack. It is not  dynamically allocated
memory.

Thanks for listening to all my rants so far :).

Have a nice day,

Goffredo Marocchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20070831/cb937ff0/attachment.htm>


More information about the cbe-oss-dev mailing list