[RFC/PATCH] powerpc: MPC7450 L2 HW cache flush feature utilization

Thu Jun 28 18:35:23 EST 2007

>>> First, I'm looking for a help and advice why the current _set_L2CR()
>>> implementation may not work for MPC7450 (namely 7448 with 1Mb L2 
>>> cache
>>> installed). Is it a bug in _set_L2CR()  or a hardware problem.
>>
>> I think that if anyone here could answer this straight
>> away, the source code would have been fixed already ;-)
>
> I think I can try to answer this question. Please, look through my 
> thoughts below and correct me if I'm somewhere wrong.

You forgot step 0: the goal of flushing the caches here
is to make sure there is no data at all in there after it
has finished.

> The current scheme of flushing the caches is based on a number of 
> consecutive lwz/dcbf instructions. A contiguous memory region (started 
> from zero) is read by series of lwz commands and then cache is flushed 
> using a sequence of dcbf instructions with addresses from this memory 
> range. If I understand correctly, to get this approach working it is 
> required to guarantee that after reading the memory region, each line 
> in a cache should be used and keep data from this region. Otherwise, 
> if some cache lines keep data from another address range they will not 
> be flushed by the dcbf instructions sequence.

Yes, you need to ensure there is nothing interfering (SMP
agents, DMA agents, prefetch engines...), and you need to
know the line replacement policy, to make this work;
furthermore, you need to be quite careful in your code to
make sure the intended L2 stores are the _only_ L2 traffic
you generate.

> Further, how cache lines are utilized is dictated by a cache lines 
> replacement policy. I didn't go in to details deeply, but on MPC7450 
> L1 cache lines replacement policy seems to  satisfy the requirement 
> above. At least the MPC7450 reference manual describes L1 cache 
> flushing algorithm based on a sequence of lwz/dcbf instructions.
>
> But regarding to L2/L3 caches, the manual describes two different 
> cache line replacement policies. And the both are pseudo-random

At least on is the "standard" PowerPC pseudo-LRU tree, no?
That one flushes fine using this strategy.

> and differ by implementation of random number generator. It means that 
> a cache line in a set is chosen randomly, and that, in turn, means 
> that there is a probability that some cache lines are not used during 
> reading of the contiguous memory region and not flushed by the dcbf 
> instruction sequence.

Knowing that there is no "outside" interference, and knowing
the "random" algorithm, can give plenty guarantees.

> For example, on MPC7448 there is a eight-way set-associative 1Mb L2 
> cache that consist of 2048 sets x 8 ways per set. And even if a set N 
> has been accessed M times (M > 8) there is a chance that some cache 
> line is set N has never been used, but another line has been used 
> twice or more. Of course, the probability of such situation decreases 
> with increasing of N.

You can make sure, too.  Just trying to statistically get
to the point where you are sure the whole cache is flushed
is not going to work *at all*, you need to use deeper knowledge
of how the cache works.

> Current _set_L2CR() implementation reads first 4Mb of memory to flush 
> the L2 cache. I have increased this size up to 16 Mb and now things 
> work fine. But I don't think that is a right way to fix the problem 
> because there is no any way to define the upper limit of memory size 
> to guarantee flushing of each cache line. 16Mb is too large though. It 
> seems more reasonable to use a stable and guaranteed way to flush the 
> cache implemented in hardware.

Yes, use the hardware flush mechanism.  Please :-)

[I think the erratum is about insn fetches to L2 that you
have no way too stop.  <handwaving>Something like that,
anyway.</handwaving>]

Segher