erofs pointer corruption and kernel crash

Gao Xiang hsiangkao at linux.alibaba.com
Fri Apr 10 18:59:34 AEST 2026



On 2026/4/10 16:51, Arseniy Krasnov wrote:
> 
> 
> 10.04.2026 11:42, Gao Xiang wrote:
>>
>>
>> On 2026/4/10 16:31, Gao Xiang wrote:
>>> Hi,
>>>
>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>> Hi,
>>>>
>>>> We found unexpected behaviour of erofs:
>>>>
>>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>>> 'struct folio' as first argument, and there is loop inside this function,
>>>> which updates 'private' field of provided folio:
>>>>
>>>>     do {
>>>>             orig = atomic_read((atomic_t *)&folio->private);
>>>>             DBG_BUGON(orig <= 0);
>>>>             v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>>             v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>>     } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>>
>>>> Now, we see that in some rare case, this function processes folio, where
>>>> 'private' is pointer, and thus this loop will update some bits in this
>>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>>
>>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>>    {
>>>>        int orig, v;
>>>> +    if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>>
>>> No, if erofs_onlinefolio_end() is called, `folio->private`
>>> shouldn't be a pointer, it's just a counter inside, and
>>> storing a pointer is unexpected.
>>>
>>> And since the folio is locked, it shouldn't call into
>>> try_to_free_buffers().
>>>
>>> Is it easy to reproduce? if yes, can you print other
>>> values like `folio->mapping` and `folio->index` as
>>> well?
>>>
>>> I need more informations to find some clues.
>>
>> btw, is that an unmodified upstream kernel "6.15.11-sdkernel"?
>> Currently I never heard Android phone vendors using 6.12 LTS
>> for example hit this. If it can easily reproduced, is it
>> possible to reproduce with the upstream kernel?
> 
> Yes, this is just upstream kernel, no vendor modifications. It is not android, just
> buildroot.

I know, I mean for buildroot workloads, it should be
less pressure since it's just for embeded use.

> 
>>
>> And is the "0xffff000002b32468" pointer a valid pointer? what
>> does it point to? If it looks erofs pointer, the only one I
>> can think out is "struct z_erofs_pcluster", if it's not the
>> case, I think there should be other thing wrong if the kernel
>> is modified.
> 
> Yes, this is valid pointer, need to check about that pointer. I'll feedback here.

Anyway, if z_erofs_decompress_queue->erofs_onlinefolio_end()
is called:
   - the folio should be locked, and folio->private should not
     be a pointer;

   - it seems `PG_Private` is set on the problematic folio
     (otherwise try_to_free_buffers() won't be called), which
     is unexpected too.

So what I need for some further analysis are:

   - the folio structure (folio flags, mapping, index, count, etc.);

   - what does folio->private point to?

Also is it possible I could get the memory dump if possible?
Not quite sure if it's possible in buildroot environment.

Thanks,
Gao Xiang

> 
> Thanks
> 
>>
>>>
>>> Thanks,
>>> Gao Xiang


More information about the Linux-erofs mailing list