[PATCH 0/2] Faster MMU lookups for Book3s v3

Thu Jul 1 22:28:06 EST 2010

Avi Kivity wrote:
> On 07/01/2010 01:00 PM, Alexander Graf wrote:
>>
>> But doesn't that mean that you still need to loop through all the hvas
>> that you want to invalidate?
>
> It does.
>
>>   Wouldn't it speed up dirty bitmap flushing
>> a lot if we'd just have a simple linked list of all sPTEs belonging to
>> that memslot?
>>    
>
> The complexity is O(pages_in_slot) + O(sptes_for_slot).
>
> Usually, every page is mapped at least once, so sptes_for_slot
> dominates.  Even when it isn't so, iterating the rmap base pointers is
> very fast since they are linear in memory, while sptes are scattered
> around, causing cache misses.

Why would pages be mapped often? Don't you use lazy spte updates?

>
> Another consideration is that on x86, an spte occupies just 64 bits
> (for the hardware pte); if there are multiple sptes per page (rare on
> modern hardware), there is also extra memory for rmap chains;
> sometimes we also allocate 64 bits for the gfn.  Having an extra
> linked list would require more memory to be allocated and maintained.

Hrm. I was thinking of not having an rmap but only using the chain. The
only slots that would require such a chain would be the ones with dirty
bitmapping enabled, so no penalty for normal RAM (unless you use kemari
or live migration of course).

But then again I probably do need an rmap for the mmu_notifier magic,
right? But I'd rather prefer to have that code path be slow and the
dirty bitmap invalidation fast than the other way around. Swapping is
slow either way.

Alex