erofs pointer corruption and kernel crash

Arseniy Krasnov avkrasnov at salutedevices.com
Fri Apr 10 23:35:31 AEST 2026



10.04.2026 15:20, Gao Xiang пишет:
> 
> 
> On 2026/4/10 19:37, Arseniy Krasnov wrote:
> 
> (drop unrelated folks since they all subscribed erofs mailing list)
> 
>>
>>
>> 10.04.2026 11:31, Gao Xiang wrote:
>>> Hi,
>>>
>>> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>>>> Hi,
>>>>
>>>> We found unexpected behaviour of erofs:
>>>>
>>>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>>>> 'struct folio' as first argument, and there is loop inside this function,
>>>> which updates 'private' field of provided folio:
>>>>
>>>>     do {
>>>>             orig = atomic_read((atomic_t *)&folio->private);
>>>>             DBG_BUGON(orig <= 0);
>>>>             v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>>>>             v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>>>>     } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>>>
>>>> Now, we see that in some rare case, this function processes folio, where
>>>> 'private' is pointer, and thus this loop will update some bits in this
>>>> pointer. Then later kernel dereferences such pointer and crashes.
>>>>
>>>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>>>
>>>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>>>> index 33cb0a7330d2..b1d8deffec4d 100644
>>>> --- a/fs/erofs/data.c
>>>> +++ b/fs/erofs/data.c
>>>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>>>>    {
>>>>        int orig, v;
>>>>    +    if (((uintptr_t)folio->private) & 0xffff000000000000) {
>>>
>>> No, if erofs_onlinefolio_end() is called, `folio->private`
>>> shouldn't be a pointer, it's just a counter inside, and
>>> storing a pointer is unexpected.
>>>
>>> And since the folio is locked, it shouldn't call into
>>> try_to_free_buffers().
>>>
>>> Is it easy to reproduce? if yes, can you print other
>>> values like `folio->mapping` and `folio->index` as
>>> well?
>>>
>>> I need more informations to find some clues.
>>
>>
>>
>> So reproduced again with this debug patch which adds magic to 'struct z_erofs_pcluster' and prints 'struct folio'
>> when pointer in 'private' is passed to 'erofs_onlinefolio_end()'. In short - 'private' points to 'struct z_erofs_pcluster'.
> First, erofs-utils 1.8.10 doesn't support `-E48bit`:
> only erofs-utils 1.9+ ship it as an experimental
> feature, see Changelog; so I think you're using
> modified erofs-utils 1.8.10:
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/ChangeLog
> 
> ```
> erofs-utils 1.9
> 
>  * This release includes the following updates:
>    - Add 48-bit layout support for larger filesystems (EXPERIMENTAL);
> ```
> 
> Second, I'm pretty sure this issue is related to
> experimenal `-E48bit`, and those information is
> not enough for me to find the root cause, so I
> need to find a way to reproduce myself: It may
> take time; you could debug yourself but I don't
> think it's an easy task if you don't quite familiar
> with the EROFS codebase.

Also some more information just catched with CONFIG_EROFS_FS_DEBUG. Same problem, but enabled
debug logic BUGed kernel earlier. May be useful for You.

Thanks


[  368.587000][  T608] ------------[ cut here ]------------
[  368.587079][  T608] kernel BUG at fs/erofs/zdata.c:1606!
[  368.591977][  T608] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
[  368.593622][ T1214] ------------[ cut here ]------------
[  368.598779][  T608] Modules linked in: vlsicomm(O)
[  368.604040][ T1214] kernel BUG at fs/erofs/zdata.c:1606!
[  368.608787][  T608] CPU: 1 UID: 0 PID: 608 Comm: kworker/1:3H Tainted: G           O        6.15.11-sdkernel #1 PREEMPT
[  368.624876][  T608] Tainted: [O]=OOT_MODULE
[  368.635015][  T608] Workqueue: kverityd verity_work
[  368.639844][  T608] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  368.647428][  T608] pc : z_erofs_endio+0x220/0x270
[  368.652172][  T608] lr : z_erofs_endio+0x23c/0x270
[  368.656920][  T608] sp : ffff80008215bbe0
[  368.660887][  T608] x29: ffff80008215bbe0 x28: ffff0000032feb40 x27: ffff0000032feb80
[  368.668646][  T608] x26: fffffdffc0029280 x25: 0000000000000009 x24: ffff000000be17e0
[  368.676408][  T608] x23: ffff000007e85c00 x22: 0000000000001000 x21: 0000000000001000
[  368.684170][  T608] x20: 0000000000000000 x19: 0000000000001000 x18: 00000000e6fb12fd
[  368.691933][  T608] x17: 00000000c98c11f0 x16: 00000000ac7e39e2 x15: 00000000c3362985
[  368.699696][  T608] x14: 0000000001040820 x13: 00000000a3bddb58 x12: ffff80008215bb68
[  368.707458][  T608] x11: 0000000049a63821 x10: ffff8000809febe0 x9 : 0000000000000000
[  368.715221][  T608] x8 : ffff000003cee8e8 x7 : 0000000000000000 x6 : 459ea227f0118cc9
[  368.722983][  T608] x5 : 0000000000000000 x4 : 1ff0000000004021 x3 : 0000000000000000
[  368.730746][  T608] x2 : 0000000000000000 x1 : ffff0000029f3e00 x0 : fffffdffc0029240
[  368.738513][  T608] Call trace:
[  368.741619][  T608]  z_erofs_endio+0x220/0x270 (P)
[  368.746362][  T608]  bio_endio+0x138/0x150
[  368.750411][  T608]  __dm_io_complete+0x1e0/0x2b0
[  368.755068][  T608]  clone_endio+0xd0/0x270
[  368.759213][  T608]  bio_endio+0x138/0x150
[  368.763262][  T608]  verity_finish_io+0x64/0xf0
[  368.767747][  T608]  verity_work+0x30/0x40
[  368.771800][  T608]  process_one_work+0x180/0x2e0
[  368.776463][  T608]  worker_thread+0x2c4/0x3f0
[  368.780862][  T608]  kthread+0x12c/0x210
[  368.784742][  T608]  ret_from_fork+0x10/0x20
[  368.788979][  T608] Code: 17ffffc8 f9401401 b100103f 54fff5a0 (d4210000)
[  368.795698][  T608] ---[ end trace 0000000000000000 ]---
[  368.813672][  T608] Kernel panic - not syncing: Oops - BUG: Fatal exception
[  368.815015][  T608] SMP: stopping secondary CPUs
[  369.896670][  T608] SMP: failed to stop secondary CPUs 0
[  369.896729][  T608] Kernel Offset: disabled
[  369.900508][  T608] CPU features: 0x0000,00000000,01000000,0200420b
[  369.906718][  T608] Memory Limit: none
[  369.922397][  T608] Rebooting in 5 seconds..



> 
> Anyway I really suggest if you need a rush solution
> for production, don't use `-E48bit + zstd` like
> this for now: try to use other options like
> `-zzstd -C65536 -Efragments` instead since those
> are common production choices.
> 
> Thanks,
> Gao Xiang



More information about the Linux-erofs mailing list