[Cbe-oss-dev] SPU-GCC optimzation problem
Ulrich Weigand
Ulrich.Weigand at de.ibm.com
Wed Mar 12 02:32:07 EST 2008
Hello Trevor,
please see Machida-san's bug report below. It looks like this is a
problem that came in with your big PS3/FSF backport merge to the 3C
repository (rev. 770). Before that, spu_rlmaskqwbyte would use an
UNSPEC_SPU_ROTQBY unspec which never got optimized. After that change,
the intrinsic is open-coded in terms of an LSHIFTRT rtx.
This would be perfectly fine normally, as long as SHIFT_COUNT_TRUNCATED is
not defined (which is isn't on spu). However, there is special CELL
LOCAL code in simplify-rtx.c:simplify_binary_operation_1 which uses an
#ifdef SPU hack to *always* truncate shift counts anyway.
There is some comment around this CELL LOCAL section that seems to say
this is required to fix a bug in some test case. I'm wondering whether
this statement is still correct after the rev. 770 changes to the back-end
... Reverting this CELL LOCAL change fixes the problem reported below.
Can you advise whether this is right thing to do? Thanks!
Machida-san, thanks for reporting the problem!
Mit freundlichen Gruessen / Best Regards
Ulrich Weigand
--
Dr. Ulrich Weigand | Phone: +49-7031/16-3727
GNU compiler/toolchain for Linux on System z and Cell BE
IBM Deutschland Entwicklung GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung:
Herbert Kircher
Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294
"Hiroyuki Machida" <Hiroyuki.Mach at gmail.com>
Sent by: cbe-oss-dev-bounces+ulrich.weigand=de.ibm.com at ozlabs.org
03/11/08 10:10 AM
Please respond to
Hiroyuki.Mach at gmail.com
To
cbe-oss-dev at ozlabs.org
cc
Subject
[Cbe-oss-dev] SPU-GCC optimzation problem
Hi,
I got a problem with SPU-GCC in IBM Cell SDK 3.0.
I could not reproduce this with SPU-GCC in older Cell
SDK 2.1 or 2.0. I attached details, below.
Hiroyuki.
---
* Summary
The spu-gcc generates incorrect code for spu_rlmaskqwbyte intrinsic
when:
-- compiled with "-O1" or higher optimization level and
-- the second argument is a constant (immediate) value and
-- 1 <= (the second argument) mod 32 <= 16
* Version
IBM Cell SDK 3.0 spu-gcc, spu-g++
IBM Cell SDK 2.1 or earlier don't have this problem.
* Sample code
---
#include <spu_intrinsics.h>
vector unsigned int source = { 0x11111111, 0x22222222, 0x33333333,
0x44444444, };
int main(int argc, char **argv)
{
vector unsigned int result;
result = spu_rlmaskqwbyte(source, -17);
/* all elements should be zero. */
printf("0x%08x 0x%08x 0x%08x 0x%08x, \n",
spu_extract(result, 0),
spu_extract(result, 1),
spu_extract(result, 2),
spu_extract(result, 3));
return 0;
}
---
* Additional information
It seems that when optimization is enabled, the second argument is
normalized into -15 to 0.
---
main:
ila $3,.LC0
hbrr .L3,printf
lqr $2,source
stqd $lr,16($sp)
stqd $sp,-32($sp)
ai $sp,$sp,-32
nop 127
rotqmbyi $7,$2,-1 # <==========
ori $4,$7,0
rotqbyi $5,$7,(1*4+0)%16
rotqbyi $6,$7,(2*4+0)%16
rotqbyi $7,$7,(3*4+0)%16
nop 127
.L3:
brsl $lr,printf
ai $sp,$sp,32
fsmbi $3,0
lqd $lr,16($sp)
bi $lr
---
_______________________________________________
cbe-oss-dev mailing list
cbe-oss-dev at ozlabs.org
https://ozlabs.org/mailman/listinfo/cbe-oss-dev
More information about the cbe-oss-dev
mailing list