Disabling ppc64le hosts for Qemu tests

Joel Stanley joel at jms.id.au
Wed Nov 14 10:43:57 AEDT 2018


On Wed, 14 Nov 2018 at 07:00, Andrew Geissler <geissonator at gmail.com> wrote:
>
> On Mon, Nov 12, 2018 at 11:08 PM Joel Stanley <joel at jms.id.au> wrote:
> >
> > Hello Andrew,
> >
> > I've been trying to get to the bottom of the slow-booting qemu issue
> > for a few weeks now. We commited a fix that resolve the issue on x86
> > hosts, but when running the same Qemu build on ppc64le it still has
> > the "go slow" behaviour.
> >
> >  https://github.com/openbmc/qemu/issues/14
> >
> > I propose we run the Romulus Qemu boot test on x86-only build slaves
> > while we work this out. That will allow us to unblock the kernel
> > security bumps, which have been pending for a few weeks.
>
> Unfortunately there's no way that I can find in the jenkins multi-configuration
> matrix plugin to specify which part of an axis goes to which slave node.
>
> The flow right now is that whichever node gets assigned to build the
> image, is then used to run the QEMU job (so no transfers of
> images have to occur).

This sounds like the easiest thing to fix. We would still maintain all
four CI boxes, but ensure the Qemu job is run on an x86 box.

We could even run the Qemu job on openpower.xyz (aka 'slave') itself,
as the images are already being copied here.

> The assignment of which node builds
> which config is random so builder4 (our ppc64le) node gets Romulus
> about 25% of the time.
>
> Removing builder4 from the build queue would work, but would
> also remove 25% of our build capacity.

Is there a reason why we don't use the "openbmc-ci" build slave more?

 https://openpower.xyz/computer/openbmc-ci1/builds

> > I'll continue working on fixing Qemu in the future, but I don't have
> > time for the next two weeks due to some higher priority work.
>
> This feels like one of those things that's always tough to come back
> to once you let a workaround in.

I don't understand? There's no workaround going into Qemu, this is
simply for the Jenkins issue. Once Qemu is fixed we can re-enable the
Jenkins slave.

> Is there a specific kernel commit
> you've bisected down to that we could just pull from the openbmc/linux
> tree? Is there a workaround we could put in our openbmc/qemu for
> now? I'd prefer both of these over just disabling ppc64le.

The fix corrects a nasty bug in the Linux timer driver. Reverting this
fix would result in inaccurate running of timers in Linux, which is
what the system currently does. I would consider it a high priority
fix to have in the product.

The Qemu workaround is already in place, and works for x86. It doesn't
work for ppc64le.


More information about the openbmc mailing list