[PATCH 1/3] kexec: Prevent removal of memory in use by a loaded kexec image

David Hildenbrand david at redhat.com
Wed Apr 22 00:30:18 AEST 2020


>> b) "kexec -s -l" seems to work fine. For now, the kernel does not seem
>> to get placed on virtio-mem memory (pure luck due to the left-to-right
>> search). Memory added by virtio-mem is not getting added to the e820
>> map. Once the virtio-mem driver comes back up in the kexec kernel, the
>> right memory is readded.
> 
> This sounds like a bug.

This is how virtio-mem wants its memory to get handled.

> 
>> c) "kexec -c -l" does not work properly. All memory added by virtio-mem
>> is added to the e820 map, which is wrong. Memory that should not be
>> touched will be touched by the kexec kernel. I assume kexec-tools just
>> goes ahead and adds anything it can find in /proc/iomem (or
>> /sys/firmware/memmap/) to the e820 map of the new kernel.
>>
>> Due to c), I assume all hotplugged memory (e.g., ACPI DIMMs) is
>> similarly added to the e820 map and, therefore, won't be able to be
>> onlined MOVABLE easily.
> 
> This sounds like correct behavior to me.  If you add memory to the
> system it is treated as memory to the system.

Yeah, I would agree if we are talking about DIMMs, but this memory is
special. It's added via a paravirtualized interface and will contain
holes, especially after unplug. While memory in these holes can usually
be read, it should not be written. More on that below.

> 
> If we need to make it a special kind of memory with special rules we can
> have some kind of special marking for the memory.  But hotplugged is not
> in itself a sufficient criteria to say don't use this as normal memory.

Agreed. It is special, though.

> 
> If take a huge server and I plug in an extra dimm it is just memory.

Agreed.

[...]

> 
> Now perhaps virtualization needs a special tier of memory that should
> only be used for cases where the memory is easily movable.
> 
> I am not familiar with virtio-mem but my skim of the initial design
> is that virtio-mem was not designed to be such a special tier of memory.
> Perhaps something has changed?
> https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg03870.html

Yes, a lot changed. See
https://lkml.kernel.org/r/20200311171422.10484-1-david@redhat.com for
the latest-greatest design overview.


> 
>> b) Teach kexec-tools to leave virtio-mem added memory alone. E.g., by
>> indicating it in /proc/iomem in a special way ("System RAM
>> (hotplugged)"/"System RAM (virtio-mem)").
> 
> How does the kernel memory allocator treat this memory?

So what virtio-mem does is add memory sections on demand and populate
within these sections the requested amount of memory. E.g., if 64MB are
requested, it will add a 128MB section/resource but only make the first
64MB accessible (via the hypervisor) and only give the first 64MB to the
buddy. This way of adding memory is similar to what XEN and hypver-v
balloon drivers do when hotplugging memory.

When requested to plug more memory, it might go ahead and make (parts
of) the remaining 64MB accessible and give them to the buddy. In case it
cannot "fill any holes", it will add a new section.

When requested to unplug memory, it will try to remove memory from the
added (here 64MB) memory from the buddy and tell the hypervisor about it.

So, it has some similarity to ballooning in virtual environment,
however, it manages its own device memory only and can therefore give
better guarantees and detect malicious guests.

Right now, I think the right approach would be to not create
/sys/firmware/memmap entries from memory virtio-mem added.

[...]

> 
> p.s.  Please excuse me for jumping in I may be missing some important
> context, but what I read when I saw this message in my inbox just seemed
> very wrong.

Yeah, still, thanks for having a look. Please let me know if you need
more information.

-- 
Thanks,

David / dhildenb



More information about the Linuxppc-dev mailing list