Disabling ppc64le hosts for Qemu tests

Joel Stanley joel at jms.id.au
Thu Nov 15 10:34:55 AEDT 2018


On Thu, 15 Nov 2018 at 08:15, Andrew Geissler <geissonator at gmail.com> wrote:
>
> On Tue, Nov 13, 2018 at 5:44 PM Joel Stanley <joel at jms.id.au> wrote:
> >
> > On Wed, 14 Nov 2018 at 07:00, Andrew Geissler <geissonator at gmail.com> wrote:
> > >
> > > On Mon, Nov 12, 2018 at 11:08 PM Joel Stanley <joel at jms.id.au> wrote:
> > > >
> > > > Hello Andrew,
> > > >
> > > > I've been trying to get to the bottom of the slow-booting qemu issue
> > > > for a few weeks now. We commited a fix that resolve the issue on x86
> > > > hosts, but when running the same Qemu build on ppc64le it still has
> > > > the "go slow" behaviour.
> > > >
> > > >  https://github.com/openbmc/qemu/issues/14
> > > >
> > > > I propose we run the Romulus Qemu boot test on x86-only build slaves
> > > > while we work this out. That will allow us to unblock the kernel
> > > > security bumps, which have been pending for a few weeks.
> > >
> > > Unfortunately there's no way that I can find in the jenkins multi-configuration
> > > matrix plugin to specify which part of an axis goes to which slave node.
> > >
> > > The flow right now is that whichever node gets assigned to build the
> > > image, is then used to run the QEMU job (so no transfers of
> > > images have to occur).
> >
> > This sounds like the easiest thing to fix. We would still maintain all
> > four CI boxes, but ensure the Qemu job is run on an x86 box.
> >
> > We could even run the Qemu job on openpower.xyz (aka 'slave') itself,
> > as the images are already being copied here.
>
> I think to do this, we'd need something like
> https://stackoverflow.com/questions/7133027/retrieve-build-number-or-artifacts-of-downstream-build-in-jenkins
> The images are not copied to openpower.xyz until the very end
> of the job currently, to get them to the QEMU job, we'd need
> to copy them in between jobs.

Yeah, this is how I have the BSP jobs configured (which haven't run in
a while; they need some love):

 https://openpower.xyz/view/Aspeed%20BSP/

> Honestly, if it's just a couple of weeks, I think our best solution is
> to just remove the ppc64le system from the bitbake builder queue.
> With the US holiday, it should be a quieter few weeks then what we've
> had recently.

Okay. Lets go ahead with that change.

Cheers,

Joel>
> I have for other optimizations reasons always wanted a way to
> dedicate a QEMU queue, but I don't really have time to deal
> with getting that going right now.
>
> >
> > > The assignment of which node builds
> > > which config is random so builder4 (our ppc64le) node gets Romulus
> > > about 25% of the time.
> > >
> > > Removing builder4 from the build queue would work, but would
> > > also remove 25% of our build capacity.
> >
> > Is there a reason why we don't use the "openbmc-ci" build slave more?
> >
> >  https://openpower.xyz/computer/openbmc-ci1/builds
>
> openbmc-ci1 is our ppc64le that runs the gerrit server. I send the
> per-repository CI
> jobs there (i.e. the make/make check google test jobs). I don't send the
> big bitbake jobs (i.e. builder) because of the cpu impacts it would have
> on gerrit.
>
> > > > I'll continue working on fixing Qemu in the future, but I don't have
> > > > time for the next two weeks due to some higher priority work.
> > >
> > > This feels like one of those things that's always tough to come back
> > > to once you let a workaround in.
> >
> > I don't understand? There's no workaround going into Qemu, this is
> > simply for the Jenkins issue. Once Qemu is fixed we can re-enable the
> > Jenkins slave.
> >
> > > Is there a specific kernel commit
> > > you've bisected down to that we could just pull from the openbmc/linux
> > > tree? Is there a workaround we could put in our openbmc/qemu for
> > > now? I'd prefer both of these over just disabling ppc64le.
> >
> > The fix corrects a nasty bug in the Linux timer driver. Reverting this
> > fix would result in inaccurate running of timers in Linux, which is
> > what the system currently does. I would consider it a high priority
> > fix to have in the product.
>
> Yeah, just feels like we're stuck between a rock and hard place.
> I def understand the desire to keep the kernel up to date for the
> community.
>
> With the tag happening Friday, I assume we don't want to get this
> in until next week anyway? Or was this something you felt we
> really needed before the tag?
>
> >
> > The Qemu workaround is already in place, and works for x86. It doesn't
> > work for ppc64le.


More information about the openbmc mailing list