[Cbe-oss-dev] SPU-GCC optimzation problem

Wed Mar 12 02:32:07 EST 2008

Hello Trevor,

please see Machida-san's bug report below.  It looks like this is a 
problem that came in with your big PS3/FSF backport merge to the 3C 
repository (rev. 770).   Before that, spu_rlmaskqwbyte would use an 
UNSPEC_SPU_ROTQBY unspec which never got optimized.  After that change, 
the intrinsic is open-coded in terms of an LSHIFTRT rtx.

This would be perfectly fine normally, as long as SHIFT_COUNT_TRUNCATED is 
not defined (which is isn't on spu).   However, there is special CELL 
LOCAL code in simplify-rtx.c:simplify_binary_operation_1 which uses an 
#ifdef SPU hack to *always* truncate shift counts anyway.

There is some comment around this CELL LOCAL section that seems to say 
this is required to fix a bug in some test case.  I'm wondering whether 
this statement is still correct after the rev. 770 changes to the back-end 
...   Reverting this CELL LOCAL change fixes the problem reported below. 
Can you advise whether this is right thing to do?  Thanks!

Machida-san, thanks for reporting the problem!

Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

-- 
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  GNU compiler/toolchain for Linux on System z and Cell BE
  IBM Deutschland Entwicklung GmbH
  Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: 
Herbert Kircher
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht 
Stuttgart, HRB 243294

"Hiroyuki Machida" <Hiroyuki.Mach at gmail.com> 
Sent by: cbe-oss-dev-bounces+ulrich.weigand=de.ibm.com at ozlabs.org
03/11/08 10:10 AM
Please respond to
Hiroyuki.Mach at gmail.com

To
cbe-oss-dev at ozlabs.org
cc

Subject
[Cbe-oss-dev] SPU-GCC optimzation problem

Hi,

I got a problem with SPU-GCC in IBM Cell SDK 3.0.
I could not reproduce this with SPU-GCC in  older Cell
SDK 2.1 or 2.0. I attached details, below.

Hiroyuki.

---

* Summary

  The spu-gcc generates incorrect code for spu_rlmaskqwbyte intrinsic
  when:

  -- compiled with "-O1" or higher optimization level and

  -- the second argument is a constant (immediate) value and

  -- 1 <= (the second argument) mod 32 <= 16

* Version

  IBM Cell SDK 3.0 spu-gcc, spu-g++

  IBM Cell SDK 2.1 or earlier don't have this problem.

* Sample code

---
#include <spu_intrinsics.h>

vector unsigned int source = { 0x11111111, 0x22222222, 0x33333333,
0x44444444, };

int main(int argc, char **argv)
{
  vector unsigned int result;

  result = spu_rlmaskqwbyte(source, -17);

  /* all elements should be zero. */
  printf("0x%08x 0x%08x 0x%08x 0x%08x, \n",
spu_extract(result, 0),
spu_extract(result, 1),
spu_extract(result, 2),
spu_extract(result, 3));

  return 0;
}
---

* Additional information

  It seems that when optimization is enabled, the second argument is
  normalized into -15 to 0.

---
main:
        ila     $3,.LC0
        hbrr    .L3,printf
        lqr     $2,source
        stqd    $lr,16($sp)
        stqd    $sp,-32($sp)
        ai      $sp,$sp,-32
        nop     127
        rotqmbyi        $7,$2,-1 # <==========
        ori     $4,$7,0
        rotqbyi $5,$7,(1*4+0)%16
        rotqbyi $6,$7,(2*4+0)%16
        rotqbyi $7,$7,(3*4+0)%16
        nop     127
.L3:
        brsl    $lr,printf
        ai      $sp,$sp,32
        fsmbi   $3,0
        lqd     $lr,16($sp)
        bi      $lr
---
_______________________________________________
cbe-oss-dev mailing list
cbe-oss-dev at ozlabs.org
https://ozlabs.org/mailman/listinfo/cbe-oss-dev