[Skiboot] [PATCH 05/15] libc/string: Add memcpy_ci

Wed Sep 14 19:10:33 AEST 2016

Oliver O'Halloran <oohall at gmail.com> writes:
>> On 09/01/2016 06:34 AM, Stewart Smith wrote:
>>> Claudio Carvalho <cclaudio at linux.vnet.ibm.com> writes:
>>>> --- /dev/null
>>>> +++ b/libc/string/memcpy_ci.c
>>>> @@ -0,0 +1,50 @@
>>>> +/* Copyright 2013-2016 IBM Corp
>>>> + *
>>>> + * Licensed under the Apache License, Version 2.0 (the "License");
>>>> + * you may not use this file except in compliance with the License.
>>>> + * You may obtain a copy of the License at
>>>> + *
>>>> + *      http://www.apache.org/licenses/LICENSE-2.0
>>>> + *
>>>> + * Unless required by applicable law or agreed to in writing, software
>>>> + * distributed under the License is distributed on an "AS IS" BASIS,
>>>> + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
>>>> + * implied.
>>>> + * See the License for the specific language governing permissions and
>>>> + * limitations under the License.
>>>> + */
>>>> +
>>>> +#include <string.h>
>>>> +#include <io.h>
>>>> +
>>>> +const void* memcpy_ci(const void *destpp, const void *srcpp, size_t len)
>
> I really don't like this name.  If I encoutered a function called
> memcpy_ci() I would assume it performs the copy with CI loads *and*
> stores. This function copies from CI memory to cachable memory so it
> should be called memcpy_from_ci() or similar. This might seem a little
> nitpicky, but mixing cached and CI accesses to the same memory can
> cause a checkstop (even in real mode) so it's important to be clear
> about how memory is being accessed.

++

I see this is changed in V2, which is great.

I wonder if we could do any pointer annotation and sparse magic to track
things? Maybe not yet worth it though.

>>> It also (currently) assumes that src is a multiple of uint32_t, although
>>> nothing checks that and so would silently fail if called for, say 33 bytes.
>>>
>>
>> Not sure if I understand your suggestion here.
>>
>> This function doesn't check if srcpp and dstpp are multiple of uint32_t,
>> but it assumes that the processor is able to read a char and a uint32_t
>> from src. Thus, the first loop copies at most len % sizeof(uint32_t) to
>> optimize the number of reads
>
> The length checking is fine, the bigger issue here is that the first
> loop will leave srcp unaligned.  If we wanted to copy 7 bytes starting
> from 0x0 the first loop would move srcp to 0x3 before attempting to
> finish the copy with a single 32 bit load. On P8 unaligned CI loads
> will cause an alignment interrupt so the assumption that the processor
> can read a uint32_t from srcp is faulty. If you want to keep this
> optimisation the first loop needs to align srcp so that (srcp % block)
> == 0. The bulk of the copy can then be done with with 32 (or 64bit)
> loads and finished with a byte-by-byte copy. You could move the byte
> copies into a helper function to eliminate all the int <-> pointer
> casting too.

Ahh yes, unaligned CI... I knew there was something in my brain about
that. Note to self: memorize more docs.

-- 
Stewart Smith
OPAL Architect, IBM.