[Cbe-oss-dev] Simple work-around found for dmabench with up to 5 SPE's on Fedora 7 (PS3) and 2.6.23-rc4 kernel
Goffredo Marocchi
panajev at gmail.com
Sat Sep 1 02:36:03 EST 2007
Kernel compiled from Geoff Levand's git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/geoff/ps3-linux.git
Follow-up to (for those that loved a certain "show" and can catch the
reference I should start this thread like this "Hello, it's Doctor
Shoe again!" ;)):
http://www.ibm.com/developerworks/forums/dw_thread.jsp?forum=739&thread=173629&cat=46
(comment:
[i]not crashing with the SPE_DEBUG_START variable set to 1, not
crashing when MAX_SPES is set to a lower value [the ideal for PS3,
that is 6], leads me to think that somewhere in the program when
MAX_SPES is set to its normal value of 16 and you are not using any
debugging facilities memory is being overwritten or something is not
being properly aligned or maybe both with the first effect cause by
the second with the result of some DMA operations destroying some key
data. It sounds worse than an infinite loop inside the dmabench
program itself because nothing can be done at all to kill it: the
system completely hangs.
The difference cause by the se of SPE_DEBUG_START in the earlier
thread can be thought to be similar to what happens in your everyday
code when you set -O0 and the program runs well and when you set -O2
or -O3 and the program crashes.
The program itself checks the outcome of every function call it makes
[and we do not get any segfaults either: I cannot understand why with
the F-7's stock 2.6.22 PPC kernel this code would segfault while it
does not segfault with the 2.6.23-rc3 or rc4 kernels and yet it hangs
the machine], so it is something escaping every error checking done in
the program and memory errors are some of the worst offenders when it
comes to weird things happening to a program.
Another puzzling part of this "problem" is to note that all the memory
we allocate is statically allocated on the stack. It is not dynamic
memory.[/i]
)
>From dmabench's README.txt:
[code] --numspes n - specifies that n SPEs should
concurrently execute
the benchmark. The default is to execute the benchmark on a
single SPE. When the benchmark is executed on more than one
SPE, the SPEs are synchronized so that the benchmark code
starts at roughly the same time on all SPEs.[/code]
This paragraph is also telling you basically not to try to set n > 6
on a PLAYSTATION 3 (CELL BE in PS3 has the 8th SPE disabled for
redundancy reasons, the 7th SPE is not accessible by the guest OS).
What's the fix ? Another debugging related environment variable ?
No, a simple change in the dmabench.c source file.
[code]38 #define MAX_SPES 6
[/code]
Originally the value read 16 which would be the maximum you could
target with a dual CELL BE solution like you find in IBM's CELL
blades.
The number of SPU threads, with libspe2 and posix threads, is not
limited by the number of physical SPU's and what the dmabench program
does check to make sure that the number read through the --numspes
option is valid: taking the minimum between that value and the
MAX_SPES value.
[code] 224 case 'n': /* numspes */
225 num_spes = atoi(optarg);
226 num_spes = MIN(num_spes, MAX_SPES);
227 break;
[/code]
MAX_SPES is used to set-up enough available memory and SPU's context
to allow the program to run even if you would want to test all 16
SPE's in your system (if you had such a system of course).
No matter if you set --numspes to 1, 2, 3, 4, etc... the program will
allocate memory based on the value specified by MAX_SPES:
[code] 80 /* Allocate space for parameters for each SPU */
81 static dmabench_parms parms[MAX_SPES] __attribute__ ((aligned (16)));
82
83 /* Cache-line sized block for use in barrier calls. */
84 static unsigned int bar[CACHE_LINE_SIZE/sizeof(unsigned int)]
__attribu te__ ((aligned (128)));
85
86 /* Buffers in main memory that will be read or written by DMAs */
87 static uint64_t tgt_buf[MAX_SPES][NUM_ITER*NREQS*BUFSIZE]
__attribute__ ((aligned (4096)));
[...]
181 int main(int argc, char *argv[])
182 {
183 spe_gang_context_ptr_t gang = NULL;
184 spe_context_ptr_t ctx[MAX_SPES];
185 void *ls[MAX_SPES];
186 pthread_t thread[MAX_SPES];
[...]
[/code]
If you can run it with a single SPE, you have enough memory to run
this program with 5-6 SPE's. We are not THAT low on memory on PS3.
I am still not clear (read: have no clue really) why 4 SPE's is the
upper limit before the program hangs the system and just using 5 SPE's
instead does the "annoying" trick and why this error goes away when we
reduce the MAX_SPES value to 6: all the memory we allocate is
statically allocated on the stack. It is not dynamically allocated
memory.
Thanks for listening to all my rants so far :).
Have a nice day,
Goffredo Marocchi
More information about the cbe-oss-dev
mailing list