[kvm-unit-tests PATCH v2 4/9] migration: use a more robust way to wait for background job

Nicholas Piggin npiggin at gmail.com
Tue Feb 6 17:50:08 AEDT 2024


On Tue Feb 6, 2024 at 12:58 AM AEST, Marc Hartmayer wrote:
> On Fri, Feb 02, 2024 at 04:57 PM +1000, Nicholas Piggin <npiggin at gmail.com> wrote:
> > Starting a pipeline of jobs in the background does not seem to have
> > a simple way to reliably find the pid of a particular process in the
> > pipeline (because not all processes are started when the shell
> > continues to execute).
> >
> > The way PID of QEMU is derived can result in a failure waiting on a
> > PID that is not running. This is easier to hit with subsequent
> > multiple-migration support. Changing this to use $! by swapping the
> > pipeline for a fifo is more robust.
> >
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
>
> […snip…]
>
> >  
> > +	# Wait until the destination has created the incoming and qmp sockets
> > +	while ! [ -S ${migsock} ] ; do sleep 0.1 ; done
> > +	while ! [ -S ${qmp2} ] ; do sleep 0.1 ; done
>
> There should be timeout implemented, otherwise we might end in an
> endless loop in case of a bug. Or is the global timeout good enough to
> handle this situation?

I was going to say it's not worthwhile since we can't recover, but
actually printing where the timeout happens if nothing else would
be pretty helpful to gather and diagnose problems especially ones
we can't reproduce locally. So, yeah good idea.

We have a bunch of potential hangs where we don't do anything already
though. Sadly it doesn't look like $BASH_LINENO can give anything
useful of the interrupted context from a SIGHUP trap. We might be able
to do something like -

    timeout_handler() {
        echo "Timeout $timeout_msg"
	exit
    }

    trap timeout_handler HUP

    timeout_msg="waiting for destination migration socket to be created"
    while ! [ -S ${migsock} ] ; do sleep 0.1 ; done
    timeout_msg="waiting for destination QMP socket to be created"
    while ! [ -S ${qmp2} ] ; do sleep 0.1 ; done
    timeout_msg=

Unless you have any better ideas. Not sure if there's some useful
bash debugging options that can be used. Other option is adding timeout
checks in loops and blocking commands... not sure if that's simpler and
less error prone though.

Anyway we have a bunch of potential hangs and timeouts that aren't
handled already though, so I might leave this out for a later pass at
it unless we come up with a really nice easy way to go.

Thanks,
Nick

>
> > +
> >  	qmp ${qmp1} '"migrate", "arguments": { "uri": "unix:'${migsock}'" }' > ${qmpout1}
> >  
> >  	# Wait for the migration to complete
> > -- 
> > 2.42.0
> >
> >



More information about the Linuxppc-dev mailing list