[Skiboot] [PATCH 05/15] libc/string: Add memcpy_ci

Claudio Carvalho cclaudio at linux.vnet.ibm.com
Tue Sep 13 23:25:26 AEST 2016



On 09/03/2016 12:20 AM, Oliver O'Halloran wrote:
> On Fri, Sep 2, 2016 at 2:59 PM, Claudio Carvalho
> <cclaudio at linux.vnet.ibm.com> wrote:
>>
>>
>> On 09/01/2016 06:34 AM, Stewart Smith wrote:
>>> Claudio Carvalho <cclaudio at linux.vnet.ibm.com> writes:
>>>> --- /dev/null
>>>> +++ b/libc/string/memcpy_ci.c
>>>> @@ -0,0 +1,50 @@
>>>> +/* Copyright 2013-2016 IBM Corp
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at
>>>> + *
>>>> + *      http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>> + * implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <string.h>
>>>> +#include <io.h>
>>>> +
>>>> +const void* memcpy_ci(const void *destpp, const void *srcpp, size_t len)
> 
> I really don't like this name.  If I encoutered a function called
> memcpy_ci() I would assume it performs the copy with CI loads *and*
> stores. This function copies from CI memory to cachable memory so it
> should be called memcpy_from_ci() or similar. This might seem a little
> nitpicky, but mixing cached and CI accesses to the same memory can
> cause a checkstop (even in real mode) so it's important to be clear
> about how memory is being accessed.
> 

Related changes applied for V2:
- File name renamed to memcpy_from_ci.c
- Function renamed to memcpy_from_ci()

>>>> +{
>>>> +    size_t i;
>>>> +    const size_t block = sizeof(uint32_t);
>>>> +
>>>> +    unsigned long int destp = (long int) destpp;
>>>> +    unsigned long int srcp = (long int) srcpp;
>>>> +
>>>> +    /* copy just a few bytes to make dst aligned */
>>>> +    i = len % block;
>>>> +    while (i > 0)
>>>> +    {
>>>> +            *((uint8_t*) destp) = in_8((uint8_t*)srcp);
>>>> +            srcp += 1;
>>>> +            destp += 1;
>>>> +            i--;
>>>> +    }
>>>> +
>>>> +    len -= len % block;
>>>> +
>>>> +    /* copy the aligned bytes */
>>>> +    while(len > 0)
>>>> +    {
> 
> Nitpick: In skiboot we use the Kernel's brace style. Functions should
> have their brace on a new line, but control flow statements should
> have the brace on the same line.
> 

Braces fixed for V2.

>>>> +            *((uint32_t*) destp) = in_be32((uint32_t*)srcp);
>>>> +            srcp += block;
>>>> +            destp += block;
>>>> +            len -= block;
>>>> +    }
>>>> +
>>>> +    return destpp;
>>>> +}
>>>
>>> I think it would be good to add a unit test for this function.
>>>
>>
>> Makes sense. I will figure out how to add a unit test in skiboot for
>> this function.
>>
>>
>>> It also (currently) assumes that src is a multiple of uint32_t, although
>>> nothing checks that and so would silently fail if called for, say 33 bytes.
>>>
>>
>> Not sure if I understand your suggestion here.
>>
>> This function doesn't check if srcpp and dstpp are multiple of uint32_t,
>> but it assumes that the processor is able to read a char and a uint32_t
>> from src. Thus, the first loop copies at most len % sizeof(uint32_t) to
>> optimize the number of reads
> 
> The length checking is fine, the bigger issue here is that the first
> loop will leave srcp unaligned.  If we wanted to copy 7 bytes starting
> from 0x0 the first loop would move srcp to 0x3 before attempting to
> finish the copy with a single 32 bit load. On P8 unaligned CI loads
> will cause an alignment interrupt so the assumption that the processor
> can read a uint32_t from srcp is faulty. If you want to keep this
> optimisation the first loop needs to align srcp so that (srcp % block)
> == 0. The bulk of the copy can then be done with with 32 (or 64bit)
> loads and finished with a byte-by-byte copy. You could move the byte
> copies into a helper function to eliminate all the int <-> pointer
> casting too.
> 

Is this better? I am not sure if I understand what you mean about the
helper function.


void* memcpy_from_ci(void *destpp, const void *srcpp, size_t len)
{
          const size_t block = sizeof(uint32_t);

          unsigned long int destp = (long int) destpp;
          unsigned long int srcp = (long int) srcpp;

          /* Copy as many blocks as possible if srcp is block aligned */
          if ((srcp % block) == 0) {

                  while ((len - block) > -1) {
                          *((uint32_t*) destp) =
                                               in_be32((uint32_t*)srcp);
                          srcp += block;
                          destp += block;
                          len -= block;
                  }
          }

          /*
           * Byte-by-byte copy if srcp is not block aligned or len
           * is/becomes less than one block
           */
          while (len > 0) {
                  *((uint8_t*) destp) = in_8((uint8_t*)srcp);
                  srcp += 1;
                  destp += 1;
                  len--;
          }

          return destpp;
}

>>
>>
>> _______________________________________________
>> Skiboot mailing list
>> Skiboot at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/skiboot
> 
> 



More information about the Skiboot mailing list