[RFC PATCH 00/17] powerpc/e500: separate e500 from e500mc

Fri Nov 11 11:38:32 EST 2011

On Nov 10, 2011, at 11:54, Scott Wood wrote:
> On Thu, Nov 10, 2011 at 10:30:41AM -0600, Kumar Gala wrote:
>> On Nov 10, 2011, at 10:17 AM, Moffett, Kyle D wrote:
>>> Furthermore, it looks like there are a couple issues here I missed
>>> before.  PPC64 systems all appear to have an L1_CACHE_SHIFT of 7,
>>> except when you turn on the P5020DS board option which magically
>>> changes it to "6" and breaks lord-knows-what.  I think my patch
>>> series actually "breaks" that and makes e5500 use 7 as well.
>> 
>> a value of '6' on E5500 / P5020DS is correct and doesn't break anything.
>> Setting it to 7 is wrong and thus the code is correct today.
>> 
>>> Are you sure that a kernel built to support E5500 can also run on
>>> other 64-bit PowerPC/POWER systems?
>> 
>> No it will not.  There is not expectation of that as E5500 is an
>> embedded / Book-E class part and uses that ISA version.  Book-S
>> (server) 64-bit machines are not OS compatible and we are not trying to
>> make them as such (but we do re-use a lot of code).
> 
> What about other 64-bit book3e chips?  What cache block size does A2 have?

Ok, so I've been poking around this code a bunch and as far as I can
tell, the cacheline stuff has basically always been subtly wrong in
twelve different ways and it's only largely coincidence that it works
today.

So PowerPC64 systems have their own "ppc64_caches" structure set up
before start_kernel() is called by parsing the OpenFirmware "cpu" nodes.
That structure is then checked in every piece of 64-bit kernel code
(except xmon) that uses the "dcbXX" and "icbXX" opcodes.

There is an entirely separate mechanism built into the "cputable" that
is used on all PowerPC systems to compute cacheline sizes to pass in via
ELF headers for userspace to use in memset()/memcpy(), etc.

Furthermore, the VDSO gets cacheline sizes stored into it, but on 64-bit
they come from the ppc64_caches structure and on 32-bit they come from
dcache_bsize/icache_bsize copied from the cputable.

Then there's the value in arch/powerpc/include/asm/cache.h which is used
throughout the kernel to figure out how far apart to space CPU-specific
datastructures (EG: __cacheline_aligned_on_smp).

Despite the fact that all PPC64 have an "L1_CACHE_SIZE" value of 128,
the PowerPC A2 and e5500 have {d,i}cache_bsize values of 64 in cputable
and presumably also get correct values from OpenFirmware, so the bogus
constant in asm/cache.h does nothing more than waste a bit of memory
for unnecessary padding.

Unfortunately, lots of PPC32 assembly pretends that the value found in
asm/cache.h is a hard truth and uses it for "dcbz", etc, which is why
there are all of those ugly #ifdefs in asm/cache.h

Based on all of that, my proposal is going to be a patch which does the
following:

  (1) Conditionally set L1_CACHE_SHIFT to the maximum value used by any
      platform being compiled in for alignment purposes.

  (2) Make the ppc64_caches struct apply to ppc32 as well, and
      preinitialize it with a minimum value used by any platform being
      compiled in (for "dcbXX"/"icbXX" purposes).  This is safe because
      the pagesize is always a multiple of the cache block size and the
      kernel only uses dcbXX/icbXX on whole pages.  The only impact is a
      temporary small performance hit from flushing or zeroing the same
      block 8 times if too small.

  (3) Try to initialize the ppc_caches struct on ppc32 from the
      OpenFirmware device-tree.  If that fails, then use the values we
      find in the cputable.  After this is initialized any performance
      hit in copy_page()/zero_page() will obviously disappear.

  (4) Fix all of the PPC32 assembly code that is misusing L1_CACHE_SHIFT
      to use the ppc_caches struct instead.

Does that sound like a reasonable approach?

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/