Feedback wished on possible improvment of CPU15 errata handling on mpc8xx

Joakim Tjernlund joakim.tjernlund at transmode.se
Fri Aug 30 07:26:18 EST 2013


leroy christophe <christophe.leroy at c-s.fr> wrote on 2013/08/29 23:04:03:
> 
> Le 29/08/2013 19:57, Joakim Tjernlund a écrit :
> > "Linuxppc-dev"
> > <linuxppc-dev-bounces+joakim.tjernlund=transmode.se at lists.ozlabs.org>
> > wrote on 2013/08/29 19:11:48:
> >> The mpc8xx powerpc has an errata identified CPU15 which is that 
whenever
> >> the last instruction of a page is a conditional branch to the last
> >> instruction of the next page, the CPU might do crazy things.
> >>
> >> To work around this errata, one of the workarounds proposed by 
freescale
> > is:
> >> "In the ITLB miss exception code, when loading the TLB for an MMU 
page,
> >> also invalidate any TLB referring to the next and previous page using
> >> tlbie. This intentionally forces an ITLB miss exception on every
> >> execution across sequential MMU page boundaries"
> >>
> >> It is that workaround which has been implemented in the kernel. The
> >> drawback of this workaround is that TLB miss is encountered everytime 
we
> >> cross page boundary. On a flat program execution, it means that we 
get a
> >> TLB miss every 1000 instructions. A TLB miss handling is around 30/40
> >> instructions, which means a degradation of about 4% of the 
performances.
> >> It can be even worse if the program has a loop astride two pages.
> >>
> >> In the errata document from freescale, there is an example where they
> >> only invalidate the TLB when the page has the actual issue, in 
extenso
> >> when the page has the offending instruction at offset 0xffc, and they
> >> suggest to use the available PTE bits to tag pages in advance.
> >>
> >> I checked in asm/pte-8xx.h : we still have one SW bit available
> >> (0x0080). So I was thinking about using that bit to mark pages
> >> CPU15_SAFE when loading them if they don't have the offending
> > instruction.
> >> Then, in the ITLBmiss handler, instead of always invalidating 
preceeding
> >> and following pages, we would check SW bit in the PTE and invalidate
> >> following page only if current page is not marked CPU15_SAFE, then 
check
> >> the PTE of preceeding page and invalidate it only if it is not marked
> >> CPU15_SAFE
> >>
> >> I believe this would improve the CPU15 errata handling and would 
reduce
> >> the overhead introduced by the handling of this errata.
> >>
> >> Do you see anything wrong with my proposal ?
> > Just that you are using up the last bit of the pte which will be 
needed at
> > some point.
> > Have you run into CPU15? We have been using 8xx for more than 10 years 
on
> > kernel 2.4 and I
> > don't think we ever run into this problem.
> Ok, indeed I have activated the CPU15 errata in the kernel because I 
> know my CPU has the bug.
> Do you think it can be deactivated without much risk though ?

Can't say for you, all I know that our 860 and 862 CPUs seem to work OK.

> > If you go forward with this I suggest you use the WRITETHRU bit 
instead
> > and make
> > it so the user can choose which to use.
> >
> > If you want to optimize TLB misses you might want to add support for 
8MB
> > pages, I got
> > the TLB and kernel memory done in my 2.4 kernel. You could start with 
that
> > and
> > add 8MB user space page.
> In 2.6 Kernel we have CONFIG_PIN_TLB which pins the first 8Mbytes in 
> ITLB and pins the first 24Mbytes in DTLB as far as I understand. Do we 
> need more for the kernel ? I so, yes I would be interested in porting 
> your code to 2.6

Yes, 2.4 has the same. There is a drawback with pinning though, you pin 4 
ITLBs and 4 DTLBs.
One only needs 1 ITLB for kernel so the other 3 are unused. 24MB DTLs is 
pretty statik, chances
are that it is either too much or too little.

> 
> Wouldn't we waste memory by using 8Mbytes pages in user mode ?

Don't know the details of how user space deal with these pages, hopefully
someone else knows better.

> I read somewhere that Transparent Huge Pages have been ported on powerpc 

> in future kernel 3.11. Therefore I was thinking about maybe adding 
> support for hugepages into 8xx.
> 8xx has 512kbytes hugepages, I was thinking that maybe it would be more 
> appropriate than 8Mbytes pages.

See previous comment, although 8MB pages is less TLB insn as I recall.

> Do you think it would be feasible and usefull to do this for embeddeds 
> system having let say 32 to 128Mbytes RAM ?

One could stop for just kernel memory. With 8MB pages there are some 
additional 
advantages compared with PINNED TLBs:
- you map all kernel memory
- you can also map other spaces, I got both IMMR/BCR and all my NOR FLASH
  mapped with 8MB pages.

 Jocke


More information about the Linuxppc-dev mailing list