[PATCH v9 0/5] arm64/riscv: Add support for crashkernel CMA reservation

Tue Mar 24 15:02:51 AEDT 2026

On 2026/3/24 0:55, Andrew Morton wrote:
> On Mon, 23 Mar 2026 15:27:40 +0800 Jinjie Ruan <ruanjinjie at huawei.com> wrote:
> 
>> The crash memory allocation, and the exclude of crashk_res, crashk_low_res
>> and crashk_cma memory are almost identical across different architectures,
>> This patch set handle them in crash core in a general way, which eliminate
>> a lot of duplication code.
>>
>> And add support for crashkernel CMA reservation for arm64 and riscv.
> 
> Thanks.  AI review has completed and it asks questions:
> 	https://sashiko.dev/#/patchset/20260323072745.2481719-1-ruanjinjie@huawei.com

I believe it identified 4 valid issues:

- The already discovered crashk_low_res not excluded bug in the existing
RISC-V code.

- An existing memory leak issue in the existing PowerPC code.

- The ordering issue of adding CMA ranges to "linux,usable-memory-range".

- An existing concurrency issue. A Concurrent memory hotplug may occur
between reading memblock and attempting to fill cmem during kexec_load()
for almost all existing architectures，I'm not sure if this is a
practical issue in reality..

 Race Condition Scenario

  Timeline:
  ---------------------------------------------------------------------
  T1: kexec_load() syscall starts
  T2: kexec_trylock() acquires kexec_lock
  T3: crash_prepare_headers() is called
  T4: arch_get_system_nr_ranges() queries memblock → finds 100 memory ranges
  T5: cmem = alloc_cmem(100) allocates buffer for 100 ranges
  T6: [RACE WINDOW] Another process triggers memory hotplug
  T7: add_memory() → lock_device_hotplug() → memblock_add_node()
  T8: New memory region added to memblock
  T9: arch_crash_populate_cmem() iterates: now finds 102 ranges
  T10: cmem->ranges[100] → OUT OF BOUNDS WRITE!
  T11: cmem->ranges[101] → OUT OF BOUNDS WRITE!
  T12: Kernel crash or memory corruption

  Why This Happens

  1. Different locks used:
    - kexec_load() uses kexec_trylock (atomic_t)
    - Memory hotplug uses device_hotplug_lock (mutex)
  2. No synchronization between these two operations
  3. Time-of-check to time-of-use (TOCTOU) issue:
    - Step T4-T5: We query the number of ranges and allocate buffer
    - Step T6-T9: Memory hotplug adds new ranges between query and
population

Any comments or suggestions on the following approach?

int crash_prepare_headers(...)
  {
      unsigned int max_nr_ranges;
      struct crash_mem *cmem;
      int ret;

      lock_device_hotplug();

      max_nr_ranges = arch_get_system_nr_ranges();
      // ...
      ret = arch_crash_populate_cmem(cmem);
      // ...

      unlock_device_hotplug();
      return ret;
  }

>