[Linuxppc-users] OpenMPI deviations & Power hardware bounds checking

Mon Jun 12 05:12:24 AEST 2017

I responded for gcc.  Unfortunately this functionality currently only exists for gfortran, at least per the documentation.

-- Bill

Bill Schmidt, Ph.D.
GCC for Linux on Power
Linux on Power Toolchain
IBM Linux Technology Center
wschmidt at linux.vnet.ibm.com

> On Jun 11, 2017, at 1:26 PM, Michele Richens <mrichens at us.ibm.com> wrote:
> 
> 
> 
> Sent from IBM Verse
> 
> Ian McIntosh --- Re: [Linuxppc-users] OpenMPI deviations & Power hardware bounds checking ---
> 
> From:	"Ian McIntosh" <ianm at ca.ibm.com>
> To:	"Nicole Trudeau" <nitrud at ca.ibm.com>
> Cc:	linuxppc-users at lists.ozlabs.org
> Date:	Sun, Jun 11, 2017 1:24 PM
> Subject:	Re: [Linuxppc-users] OpenMPI deviations & Power hardware bounds checking
> 
> I posted an answer to the second question, for XL compilers. Someone else might want to post a GCC or an LLVM answer?
> 
> Post
> 
> The XL compilers available on the IBM POWER processors on the Little Endian Linux, Big Endian Linux or AIX operating systems have a different implementation of array bounds checking. 
> Using the -qcheck or its synonym -C option turns on various kinds of checking. -qcheck=bounds checks array bounds. When this is used, the compilers check that every array reference has a valid subscript. 
> The hardware instruction used is a conditional trap, comparing the subscript to the upper limit and trapping if the subscript is too large or too small. In C and C++ the lower limit is 0. In Fortran it defaults to 1 but can be any integer. When it is not zero, the lower limit is subtracted from the subscript being checked, and the check compares that to the upper limit minus the lower limit. 
> When the limit is known at compile time and small enough, a conditional trap immediate instruction is enough. When the limit is calculated at execution time or is greater than 65535, a conditional trap instruction comparing two registers is needed. 
> The performance impact is small for several reasons: 
> 1. The conditional trap instructions are fast. 
> 2. They are executed in a standard integer pipeline. Since most POWER CPUs have 2 or 4 integer pipelines, there is usually an otherwise empty slot to put the trap in, so it is often essentially zero cost. 
> 3. When it can the compiler optimizer moves the conditional trap out of loops so it is executed only once, checking all loop iterations at once. 
> 4. When it can prove the actual subscript cannot exceed the limit, the optimizer discards the instruction. 
> 5. Also when it can prove the subscript will also be invalid, the optimizer uses an unconditional trap. 
> 6. If necessary -qcheck can be used during testing and skipped for production builds, but the overhead is small enough that's not usually necessary. 
> If my memory is correct, one long ago paper reported a 2% slowdown in one case and 0% in another. Since that CPU had only one integer pipeline, the slowdown should be significantly less with modern CPUs. 
> Other checking using the same mechanism is available to detect dereferencing NULL pointers, dividing an integer by zero, using an uninitialized auto variable, specially written asserts, etc. 
> This doesn't include all kinds of invalid memory usage, but it does handle the most common kind, does it very efficiently, and is very easy to use. 
> 
> - Ian McIntosh IBM Canada Lab Compiler Back End Support and Development
> 
> 
> "Nicole Trudeau" ---2017-06-09 04:23:39 PM---Hi everyone, I monitor user questions that come in about IBM XL compilers for Linux on
> 
> From: "Nicole Trudeau" <nitrud at ca.ibm.com>
> To: linuxppc-users at lists.ozlabs.org
> Date: 2017-06-09 04:23 PM
> Subject: [Linuxppc-users] OpenMPI deviations & Power hardware bounds checking
> Sent by: "Linuxppc-users" <linuxppc-users-bounces+ianm=ca.ibm.com at lists.ozlabs.org>
> 
> 
> 
> 
> Hi everyone,
> 
> I monitor user questions that come in about IBM XL compilers for Linux on Power and 2 things came up on my radar that are not directly related to XL that the broader Linux on Power team may be able to help answer - feel free to reply directly on the Stack Overflow or developerWorks links - really appreciate the help in advance!
> 1. developerWorks Forum: Thanks to Bill Buros for commenting today. Here's a summary of the problem "I'm having a problem with openmpi 2.1.0 and Abinit. When i run abinit task with np (number of processes) = 1 task executes fine. With np = 2 small deviations appear. With larger np task ends with an error. I compiled abinit using GNU C/C++/Fortran 4.8.5 compilers and NVIDIA Cuda 8.0.61"
> 2. Stack Overflow: "Do any CPUs have hardware support for bounds checking?" A reply has been given about x86, but no reply about Power yet. Does Power support it?
> 
> Thanks,
> Nicole Trudeau
> zSystems Software > zBuild > Compilers 
> Digital Marketing
> C2-818, Toronto, Canada Office
> 
> _______________________________________________
> Linuxppc-users mailing list
> Linuxppc-users at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-users
> 
> 
> 
> 
> _______________________________________________
> Linuxppc-users mailing list
> Linuxppc-users at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-users