Weird EROFS data corruption
Juhyung Park
qkrwngud825 at gmail.com
Wed Dec 6 01:43:09 AEDT 2023
On Tue, Dec 5, 2023 at 11:34 PM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
>
>
>
> On 2023/12/5 22:23, Juhyung Park wrote:
> > Hi Gao,
> >
> > On Tue, Dec 5, 2023 at 4:32 PM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
> >>
> >> Hi Juhyung,
> >>
> >> On 2023/12/4 11:41, Juhyung Park wrote:
> >>
> >> ...
> >>>
> >>>>
> >>>> - Could you share the full message about the output of `lscpu`?
> >>>
> >>> Sure:
> >>>
> >>> Architecture: x86_64
> >>> CPU op-mode(s): 32-bit, 64-bit
> >>> Address sizes: 39 bits physical, 48 bits virtual
> >>> Byte Order: Little Endian
> >>> CPU(s): 8
> >>> On-line CPU(s) list: 0-7
> >>> Vendor ID: GenuineIntel
> >>> BIOS Vendor ID: Intel(R) Corporation
> >>> Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
> >>> BIOS Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz None CPU
> >>> @ 3.0GHz
> >>> BIOS CPU family: 198
> >>> CPU family: 6
> >>> Model: 140
> >>> Thread(s) per core: 2
> >>> Core(s) per socket: 4
> >>> Socket(s): 1
> >>> Stepping: 1
> >>> CPU(s) scaling MHz: 60%
> >>> CPU max MHz: 4800.0000
> >>> CPU min MHz: 400.0000
> >>> BogoMIPS: 5990.40
> >>> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
> >>> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
> >>> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art
> >>> arch_perfmon pebs bts rep_good nopl xtopology nonstop_
> >>> tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes6
> >>> 4 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xt
> >>> pr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_dead
> >>> line_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
> >>> refetch cpuid_fault epb cat_l2 cdp_l2 ssbd ibrs ibpb st
> >>> ibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_
> >>> ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid
> >>> rdt_a avx512f avx512dq rdseed adx smap avx512ifma clfl
> >>> ushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl
> >>> xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
> >>> ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
> >>> hwp_pkg_req vnmi avx512vbmi umip pku ospke avx512_vbmi
> >>> 2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme av
> >>> x512_vpopcntdq rdpid movdiri movdir64b fsrm avx512_vp2i
> >>
> >> Sigh, I've been thinking. Here FSRM is the most significant difference between
> >> our environments, could you only try the following diff to see if there's any
> >> difference anymore? (without the previous disable patch.)
> >>
> >> diff --git a/arch/x86/lib/memmove_64.S b/arch/x86/lib/memmove_64.S
> >> index 1b60ae81ecd8..1b52a913233c 100644
> >> --- a/arch/x86/lib/memmove_64.S
> >> +++ b/arch/x86/lib/memmove_64.S
> >> @@ -41,9 +41,7 @@ SYM_FUNC_START(__memmove)
> >> #define CHECK_LEN cmp $0x20, %rdx; jb 1f
> >> #define MEMMOVE_BYTES movq %rdx, %rcx; rep movsb; RET
> >> .Lmemmove_begin_forward:
> >> - ALTERNATIVE_2 __stringify(CHECK_LEN), \
> >> - __stringify(CHECK_LEN; MEMMOVE_BYTES), X86_FEATURE_ERMS, \
> >> - __stringify(MEMMOVE_BYTES), X86_FEATURE_FSRM
> >> + CHECK_LEN
> >>
> >> /*
> >> * movsq instruction have many startup latency
> >
> > Yup, that also seems to fix it.
> > Are we looking at a potential memmove issue?
>
> I'm still analyzing this behavior as well as the root cause and
> I will also try to get a recent cloud server with FSRM myself
> to find more clues.
Down the rabbit hole we go...
Let me know if you have trouble getting an instance with FSRM. I'll
see what I can do.
>
> Thanks,
> Gao Xiang
More information about the Linux-erofs
mailing list