[PATCH v2 1/1] arch/fault: don't print logs for pte marker poison errors

Wed May 15 07:34:24 AEST 2024

On Tue, May 14, 2024 at 10:26:49PM +0200, Oscar Salvador wrote:
> On Fri, May 10, 2024 at 03:29:48PM -0400, Peter Xu wrote:
> > IMHO we shouldn't mention that detail, but only state the effect which is
> > to not report the event to syslog.
> > 
> > There's no hard rule that a pte marker can't reflect a real page poison in
> > the future even MCE.  Actually I still remember most places don't care
> > about the pfn in the hwpoison swap entry so maybe we can even do it? But
> > that's another story regardless..
> 
> But we should not use pte markers for real hwpoisons events (aka MCE), right?

The question is whether we can't.

Now we reserved a swp entry just for hwpoison and it makes sense only
because we cached the poisoned pfn inside.  My long standing question is
why do we ever need that pfn after all.  If we don't need the pfn, we
simply need a bit in the pgtable entry saying that it's poisoned, if
accessed we should kill the process using sigbus.

I used to comment on this before, the only path that uses that pfn is
check_hwpoisoned_entry(), which was introduced in:

commit a3f5d80ea401ac857f2910e28b15f35b2cf902f4
Author: Naoya Horiguchi <nao.horiguchi at gmail.com>
Date:   Mon Jun 28 19:43:14 2021 -0700

    mm,hwpoison: send SIGBUS with error virutal address

    Now an action required MCE in already hwpoisoned address surely sends a
    SIGBUS to current process, but the SIGBUS doesn't convey error virtual
    address.  That's not optimal for hwpoison-aware applications.

    To fix the issue, make memory_failure() call kill_accessing_process(),
    that does pagetable walk to find the error virtual address.  It could find
    multiple virtual addresses for the same error page, and it seems hard to
    tell which virtual address is correct one.  But that's rare and sending
    incorrect virtual address could be better than no address.  So let's
    report the first found virtual address for now.

So this time I read more on this and Naoya explained why - it's only used
so far to dump the VA of the poisoned entry.

However what confused me is, if an entry is poisoned already logically we
dump that message in the fault handler not memory_failure(), which is:

MCE: Killing uffd-unit-tests:650 due to hardware memory corruption fault at 7f3589d7e000

So perhaps we're trying to also dump that when the MCEs (points to the same
pfn) are only generated concurrently?  I donno much on hwpoison so I cannot
tell, there's also implication where it's only triggered if
MF_ACTION_REQUIRED.  But I think it means hwpoison may work without pfn
encoded, but I don't know the implication to lose that dmesg line.

> I mean, we do have the means to mark a page as hwpoisoned when a real
> MCE gets triggered, why would we want a pte marker to also reflect that?
> Or is that something for userfaultd realm?

No it's not userfaultfd realm.. it's just that pte marker should be a
generic concept, so it logically can be used outside userfaultfd.  That's
also why it's used in swapin errors, in which case we don't use anything
else in this case but a bit to reflect "this page is bad".

> 
> > And also not report swapin error is, IMHO, only because arch errors said
> > "MCE" in the error logs which may not apply here.  Logically speaking
> > swapin error should also be reported so admin knows better on why a proc is
> > killed.  Now it can still confuse the admin if it really happens, iiuc.
> 
> I am bit confused by this.
> It seems we create poisoned pte markers on swap errors (e.g:
> unuse_pte()), which get passed down the chain with VM_FAULT_HWPOISON,
> which end up in sigbus (I guess?).
> 
> This all seems very subtle to me.
> 
> First of all, why not passing VM_FAULT_SIGBUS if that is what will end
> up happening?
> I mean, at the moment that is not possible because we convolute swaping
> errors and uffd poison in the same type of marker, so we do not have any
> means to differentiate between the two of them.
> 
> Would it make sense to create yet another pte marker type to split that
> up? Because when I look at VM_FAULT_HWPOISON, I get reminded of MCE
> stuff, and that does not hold here.

We used to not dump error for swapin error.  Note that here what I am
saying is not that Axel is doing things wrong, but it's just that logically
swapin error (as pte marker) can also be with !QUIET, so my final point is
we may want to avoid having the assumption that "pte marker should always
be QUITE", because I want to make it clear that pte marker can used in any
form, so itself shouldn't imply anything..

Thanks,

-- 
Peter Xu