[kvm-unit-tests PATCH 14/32] powerpc: general interrupt tests

Tue Mar 5 23:12:24 AEDT 2024

On Tue, Mar 05, 2024 at 07:26:18AM +0100, Thomas Huth wrote:
> On 05/03/2024 03.19, Nicholas Piggin wrote:
> > On Fri Mar 1, 2024 at 10:41 PM AEST, Thomas Huth wrote:
> > > On 26/02/2024 11.12, Nicholas Piggin wrote:
> > > > Add basic testing of various kinds of interrupts, machine check,
> > > > page fault, illegal, decrementer, trace, syscall, etc.
> > > > 
> > > > This has a known failure on QEMU TCG pseries machines where MSR[ME]
> > > > can be incorrectly set to 0.
> > > 
> > > Two questions out of curiosity:
> > > 
> > > Any chance that this could be fixed easily in QEMU?
> > 
> > Yes I have a fix on the mailing list. It should get into 9.0 and
> > probably stable.
> 
> Ok, then it's IMHO not worth the effort to make the k-u-t work around this
> bug in older QEMU versions.
> 
> > > Or is there a way to detect TCG from within the test? (for example, we have
> > > a host_is_tcg() function for s390x so we can e.g. use report_xfail() for
> > > tests that are known to fail on TCG there)
> > 
> > I do have a half-done patch which adds exactly this.
> > 
> > One (minor) annoyance is that it doesn't seem possible to detect QEMU
> > version to add workarounds. E.g., we would like to test the fixed
> > functionality, but older qemu should not. Maybe that's going too much
> > into a rabbit hole. We *could* put a QEMU version into device tree
> > to deal with this though...
> 
> No, let's better not do this - hardwired version checks are often a bad
> idea, e.g. when a bug is in QEMU 8.0.0 and 8.1.0, it could be fixed in 8.0.1
> and then it could get really messy with the version checks...
>

We've tried to address this type of issue (but for KVM, so kernel versions
instead of QEMU versions) in the past by inventing the errata framework,
which is based on environment variables. Instead of checking for versions,
we check for a hash (which is just the commit hash of the fix). While we
do guess that the fix is present by version number, it can always be
manually set as present as well. In any case, the test is simply skipped
when the errata environment variable isn't present, so in the worst case
we lose some coverage we could have had, but the rest of the tests still
complete and we don't get the same failures over and over. An example of
its use is in arm/psci.c. Look for the ERRATA() calls.

We could extend the errata framework for QEMU/TCG. We just need to add
another bit of data to the errata.txt file for it to know it should
check QEMU versions instead of kernel versions for those errata. We can
also ignore the errata framework and just create the errata environment
variable which would by 'n' by default now and later, after distros have
fixes, it could be changed to 'y'.

Thanks,
drew