Weird EROFS data corruption
Gao Xiang
hsiangkao at linux.alibaba.com
Wed Dec 6 01:34:37 AEDT 2023
On 2023/12/5 22:23, Juhyung Park wrote:
> Hi Gao,
>
> On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
>>
>> Hi Juhyung,
>>
>> On 2023/12/4 11:41, Juhyung Park wrote:
>>
>> ...
>>>
>>>>
>>>> - Could you share the full message about the output of `lscpu`?
>>>
>>> Sure:
>>>
>>> Architecture: x86_64
>>> CPU op-mode(s): 32-bit, 64-bit
>>> Address sizes: 39 bits physical, 48 bits virtual
>>> Byte Order: Little Endian
>>> CPU(s): 8
>>> On-line CPU(s) list: 0-7
>>> Vendor ID: GenuineIntel
>>> BIOS Vendor ID: Intel(R) Corporation
>>> Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
>>> BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
>>> @ 3.0GHz
>>> BIOS CPU family: 198
>>> CPU family: 6
>>> Model: 140
>>> Thread(s) per core: 2
>>> Core(s) per socket: 4
>>> Socket(s): 1
>>> Stepping: 1
>>> CPU(s) scaling MHz: 60%
>>> CPU max MHz: 4800.0000
>>> CPU min MHz: 400.0000
>>> BogoMIPS: 5990.40
>>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
>>> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
>>> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
>>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
>>> tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
>>> 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
>>> pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
>>> line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
>>> refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
>>> ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
>>> ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
>>> rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
>>> ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
>>> xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
>>> ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
>>> hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
>>> 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
>>> x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
>>
>> Sigh, I've been thinking. Here FSRM is the most significant difference between
>> our environments, could you only try the following diff to see if there's any
>> difference anymore? (without the previous disable patch.)
>>
>> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
>> index 1b60ae81ecd8..1b52a913233c 100644
>> --- a/arch/x86/lib/memmove_64.S
>> +++ b/arch/x86/lib/memmove_64.S
>> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
>> #define CHECK_LEN cmp $0x20, %rdx; jb 1f
>> #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
>> .Lmemmove_begin_forward:
>> - ALTERNATIVE_2 __stringify(CHECK_LEN), \
>> - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
>> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
>> + CHECK_LEN
>>
>> /*
>> * movsq instruction have many startup latency
>
> Yup, that also seems to fix it.
> Are we looking at a potential memmove issue?
I'm still analyzing this behavior as well as the root cause and
I will also try to get a recent cloud server with FSRM myself
to find more clues.
Thanks,
Gao Xiang
More information about the Linux-erofs
mailing list