erofs pointer corruption and kernel crash
Gao Xiang
hsiangkao at linux.alibaba.com
Fri Apr 10 18:42:52 AEST 2026
On 2026/4/10 16:31, Gao Xiang wrote:
> Hi,
>
> On 2026/4/10 16:13, Arseniy Krasnov wrote:
>> Hi,
>>
>> We found unexpected behaviour of erofs:
>>
>> There is function in erofs - 'erofs_onlinefolio_end()'. It has pointer to
>> 'struct folio' as first argument, and there is loop inside this function,
>> which updates 'private' field of provided folio:
>>
>> do {
>> orig = atomic_read((atomic_t *)&folio->private);
>> DBG_BUGON(orig <= 0);
>> v = dirty << EROFS_ONLINEFOLIO_DIRTY;
>> v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>> } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>>
>> Now, we see that in some rare case, this function processes folio, where
>> 'private' is pointer, and thus this loop will update some bits in this
>> pointer. Then later kernel dereferences such pointer and crashes.
>>
>> To catch this, the following small debug patch was used (e.g. we check that 'private' field is pointer):
>>
>> diff --git a/fs/erofs/data.c b/fs/erofs/data.c
>> index 33cb0a7330d2..b1d8deffec4d 100644
>> --- a/fs/erofs/data.c
>> +++ b/fs/erofs/data.c
>> @@ -238,6 +238,11 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>> {
>> int orig, v;
>> + if (((uintptr_t)folio->private) & 0xffff000000000000) {
>
> No, if erofs_onlinefolio_end() is called, `folio->private`
> shouldn't be a pointer, it's just a counter inside, and
> storing a pointer is unexpected.
>
> And since the folio is locked, it shouldn't call into
> try_to_free_buffers().
>
> Is it easy to reproduce? if yes, can you print other
> values like `folio->mapping` and `folio->index` as
> well?
>
> I need more informations to find some clues.
btw, is that an unmodified upstream kernel "6.15.11-sdkernel"?
Currently I never heard Android phone vendors using 6.12 LTS
for example hit this. If it can easily reproduced, is it
possible to reproduce with the upstream kernel?
And is the "0xffff000002b32468" pointer a valid pointer? what
does it point to? If it looks erofs pointer, the only one I
can think out is "struct z_erofs_pcluster", if it's not the
case, I think there should be other thing wrong if the kernel
is modified.
>
> Thanks,
> Gao Xiang
>
>> + pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE BEFORE %px\n", __func__, __LINE__, folio, folio->private);
>> + dump_stack();
>> + }
>> +
>> do {
>> orig = atomic_read((atomic_t *)&folio->private);
>> DBG_BUGON(orig <= 0);
>> @@ -245,6 +250,9 @@ void erofs_onlinefolio_end(struct folio *folio, int err, bool dirty)
>> v |= (orig - 1) | (!!err << EROFS_ONLINEFOLIO_EIO);
>> } while (atomic_cmpxchg((atomic_t *)&folio->private, orig, v) != orig);
>> + if (((uintptr_t)folio->private) & 0xffff000000000000)
>> + pr_emerg("\n[foliodbg] %s:%d EROFS FOLIO %px PRIVATE SET %px\n", __func__, __LINE__, folio, folio->private);
>> +
>> if (v & (BIT(EROFS_ONLINEFOLIO_DIRTY) - 1))
>> return;
>> folio->private = 0;
>>
>>
>> And it gives result:
>>
>> [][ T639] [foliodbg] erofs_onlinefolio_end:242 EROFS FOLIO fffffdffc0030440 PRIVATE BEFORE ffff000002b32468
>> [][ T639] CPU: 0 UID: 0 PID: 639 Comm: kworker/0:6H Tainted: G O 6.15.11-sdkernel #1 PREEMPT
>> [][ T639] Tainted: [O]=OOT_MODULE
>> [][ T639] Workqueue: kverityd verity_work
>> [][ T639] Call trace:
>> [][ T639] show_stack+0x18/0x30 (C)
>> [][ T639] dump_stack_lvl+0x60/0x80
>> [][ T639] dump_stack+0x18/0x24
>> [][ T639] erofs_onlinefolio_end+0x124/0x130
>> [][ T639] z_erofs_decompress_queue+0x4b0/0x8c0
>> [][ T639] z_erofs_decompress_kickoff+0x88/0x150
>> [][ T639] z_erofs_endio+0x144/0x250
>> [][ T639] bio_endio+0x138/0x150
>> [][ T639] __dm_io_complete+0x1e0/0x2b0
>> [][ T639] clone_endio+0xd0/0x270
>> [][ T639] bio_endio+0x138/0x150
>> [][ T639] verity_finish_io+0x64/0xf0
>> [][ T639] verity_work+0x30/0x40
>> [][ T639] process_one_work+0x180/0x2e0
>> [][ T639] worker_thread+0x2c4/0x3f0
>> [][ T639] kthread+0x12c/0x210
>> [][ T639] ret_from_fork+0x10/0x20
>> [][ T639]
>> [][ T639] [foliodbg] erofs_onlinefolio_end:254 EROFS FOLIO fffffdffc0030440 PRIVATE SET ffff000022b32467
>> [][ T39] Unable to handle kernel paging request at virtual address ffff000022b32467
>> [][ T39] Mem abort info:
>> [][ T39] ESR = 0x0000000096000006
>> [][ T39] EC = 0x25: DABT (current EL), IL = 32 bits
>> [][ T39] SET = 0, FnV = 0
>> [][ T39] EA = 0, S1PTW = 0
>> [][ T39] FSC = 0x06: level 2 translation fault
>> [][ T39] Data abort info:
>> [][ T39] ISV = 0, ISS = 0x00000006, ISS2 = 0x00000000
>> [][ T39] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [][ T39] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [][ T39] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000001e36000
>> [][ T39] [ffff000022b32467] pgd=1800000007fff403, p4d=1800000007fff403, pud=1800000007ffe403, pmd=0000000000000000
>> [][ T39] Internal error: Oops: 0000000096000006 [#1] SMP
>> [][ T39] Modules linked in: vlsicomm(O)
>> [][ T39] CPU: 1 UID: 0 PID: 39 Comm: kswapd0 Tainted: G O 6.15.11-sdkernel #1 PREEMPT
>> [][ T39] Tainted: [O]=OOT_MODULE
>> [][ T39] pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [][ T39] pc : drop_buffers.constprop.0+0x34/0x120
>> [][ T39] lr : try_to_free_buffers+0xd0/0x100
>> [][ T39] sp : ffff80008105b780
>> [][ T39] x29: ffff80008105b780 x28: 0000000000000000 x27: fffffdffc0030448
>> [][ T39] x26: ffff80008105b8a0 x25: ffff80008105b868 x24: 0000000000000001
>> [][ T39] x23: fffffdffc0030440 x22: ffff80008105b7b0 x21: fffffdffc0030440
>> [][ T39] x20: ffff000022b32467 x19: ffff000022b32467 x18: 0000000000000000
>> [][ T39] x17: 0000000000000000 x16: 0000000000000000 x15: 00000000d69f4cc0
>> [][ T39] x14: ffff0000000c5dc0 x13: 0000000000000000 x12: ffff800080d59b58
>> [][ T39] x11: 00000000000000c0 x10: 0000000000000000 x9 : 0000000000000000
>> [][ T39] x8 : ffff80008105b7d0 x7 : 0000000000000000 x6 : 000000000000003f
>> [][ T39] x5 : 0000000000000000 x4 : fffffdffc0030440 x3 : 1ff0000000004001
>> [][ T39] x2 : 1ff0000000004001 x1 : ffff80008105b7b0 x0 : fffffdffc0030440
>> [][ T39] Call trace:
>> [][ T39] drop_buffers.constprop.0+0x34/0x120 (P)
>> [][ T39] try_to_free_buffers+0xd0/0x100
>> [][ T39] filemap_release_folio+0x94/0xc0
>> [][ T39] shrink_folio_list+0x8c8/0xc40
>> [][ T39] shrink_lruvec+0x740/0xb80
>> [][ T39] shrink_node+0x2b8/0x9a0
>> [][ T39] balance_pgdat+0x3b8/0x760
>> [][ T39] kswapd+0x220/0x3b0
>> [][ T39] kthread+0x12c/0x210
>> [][ T39] ret_from_fork+0x10/0x20
>> [][ T39] Code: 14000004 f9400673 eb13029f 54000180 (f9400262)
>> [][ T39] ---[ end trace 0000000000000000 ]---
>> [][ T39] Kernel panic - not syncing: Oops: Fatal exception
>> [][ T39] SMP: stopping secondary CPUs
>> [][ T39] Kernel Offset: disabled
>> [][ T39] CPU features: 0x0000,00000000,01000000,0200420b
>> [][ T39] Memory Limit: none
>> [][ T39] Rebooting in 5 seconds..
>>
>> So 'erofs_onlinefolio_end()' takes some folio with 'private' field contains
>> some pointer (0xffff000002b32468), "corrupts" this pointer (result will be
>> 0xffff000022b32467 - at least we see that 0x20000000 was ORed to original
>> pointer and this is (1 << EROFS_ONLINEFOLIO_DIRTY)), and then kernel crashes.
>> We guess it is not valid case when such folio is passed as argument to
>> 'erofs_onlinefolio_end()'.
>>
>> We have the following erofs configuration in buildroot:
>>
>> BR2_TARGET_ROOTFS_EROFS=y
>> BR2_TARGET_ROOTFS_EROFS_CUSTOM_COMPRESSION=y
>> BR2_TARGET_ROOTFS_EROFS_COMPRESSION_ALGORITHMS="zstd,22 --max-extent-bytes 65536 -E48bit"
>> BR2_TARGET_ROOTFS_EROFS_FRAGMENTS=y
>> BR2_TARGET_ROOTFS_EROFS_PCLUSTERSIZE=65536
>>
>>
>>
>> May be You know how to fix it or some ideas? Because we are new at erofs and need to discover and
>> learn its source code.
>>
>> Thanks
>
More information about the Linux-erofs
mailing list