erofs pointer corruption and kernel crash
Arseniy Krasnov
avkrasnov at salutedevices.com
Fri Apr 10 23:27:25 AEST 2026
10.04.2026 15:20, Gao Xiang пишет:
>
>
> On 2026/4/10 19:37, Arseniy Krasnov wrote:
>
> (drop unrelated folks since they all subscribed erofs mailing list)
>
>>
>>
>> 10.04.2026 11:31, Gao Xiang wrote:
>>> Hi,
>>>
>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>> Hi,
>>>>
>>>> We found unexpected behaviour of erofs:
>>>>
>>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>>> 'struct folio' as first argument, and there is loop inside this function,
>>>> which updates 'private' field of provided folio:
>>>>
>>>> do {
>>>> orig = atomic_read((atomic_t *)&folio->private);
>>>> DBG_BUGON(orig <= 0);
>>>> v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>> v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>> } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>>
>>>> Now, we see that in some rare case, this function processes folio, where
>>>> 'private' is pointer, and thus this loop will update some bits in this
>>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>>
>>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>> {
>>>> int orig, v;
>>>> + if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>>
>>> No, if erofs_onlinefolio_end() is called, `folio->private`
>>> shouldn't be a pointer, it's just a counter inside, and
>>> storing a pointer is unexpected.
>>>
>>> And since the folio is locked, it shouldn't call into
>>> try_to_free_buffers().
>>>
>>> Is it easy to reproduce? if yes, can you print other
>>> values like `folio->mapping` and `folio->index` as
>>> well?
>>>
>>> I need more informations to find some clues.
>>
>>
>>
>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
> only erofs-utils 1.9+ ship it as an experimental
> feature, see Changelog; so I think you're using
> modified erofs-utils 1.8.10:
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
>
> ```
> erofs-utils 1.9
>
> * This release includes the following updates:
> - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
> ```
>
> Second, I'm pretty sure this issue is related to
> experimenal `-E48bit`, and those information is
> not enough for me to find the root cause, so I
> need to find a way to reproduce myself: It may
> take time; you could debug yourself but I don't
> think it's an easy task if you don't quite familiar
> with the EROFS codebase.
>
> Anyway I really suggest if you need a rush solution
> for production, don't use `-E48bit + zstd` like
> this for now: try to use other options like
> `-zzstd -C65536 -Efragments` instead since those
> are common production choices.
Ok thanks for this advice! One more question: currently we use this options:
"zstd,22 --max-extent-bytes 65536 -E48bit". Ok we remove "zstd,22" and "E48bit",
but what about "--max-extent-bytes 65536" - is it considered stable option?
Or it is better to use your version: "-zzstd -C65536 -Efragments" ?
Thanks
>
> Thanks,
> Gao Xiang
More information about the Linux-erofs
mailing list