[Lguest] Virtio block problems on newer kernels

Rusty Russell rusty at rustcorp.com.au
Thu May 22 16:05:39 EST 2014


Sakari Ailus <sakari.ailus at iki.fi> writes:
> Hi,
>
> On Mon, Apr 07, 2014 at 11:33:23AM +0300, Sakari Ailus wrote:
>> Sakari Ailus wrote:
>> >On Mon, Mar 24, 2014 at 03:31:33AM +0200, Sakari Ailus wrote:
>> >>I haven't had time to debug this further and bisecting will be
>> >>painful as this isn't that easy to reproduce: I first noticed after
>> >>running 3.13.6 for a day but first thought it was a network issue.
>> >>The problem appeared between the two above versions (3.10.10 and
>> >>3.13.6).
>> >
>> >Oops --- after having the problem awhile and then sending the e-mail to the
>> >list, I now realise I'm using a launcher which is quite old. I'll replace
>> >that with a newer one and see if that fixes the issues.
>> 
>> So I did that. No luck.
>> 
>> It seems that virtio block devices can just get stuck. The (guest) processes
>> that try to access them will stay in uninterruptible sleep, sometimes for
>> minutes, sometimes apparently forever. The appears to be per-virtio block
>> device issue when it happens. The host lguest processes are simply in sleep
>> state.
>> 
>> Host kernel version does not matter but that of the guest does. 3.14.0 seems
>> to be affected, too. 3.12.0 works.
>> 
>> I'll try to bisect and see what I can find.
>
> I did the bisect and found this:
>
> commit 1cf7e9c68fe84248174e998922b39e508375e7c1
> Author: Jens Axboe <axboe at kernel.dk>
> Date:   Fri Nov 1 10:52:52 2013 -0600
>
>     virtio_blk: blk-mq support

Interesting!

>     Switch virtio-blk from the dual support for old-style requests and bios
>     to use the block-multiqueue.
>     
>     Acked-by: Asias He <asias at redhat.com>
>     Signed-off-by: Jens Axboe <axboe at kernel.dk>
>     Signed-off-by: Christoph Hellwig <hch at lst.de>
>
> Erratic block i/o behaviour begin when that patch is applied to the guest
> kernel. I wonder if someone else has seen this. I can reproduce it
> relatively easily by running bonnie++ in a guest:
>
> 	bonnie++ -n 0 -s 256M -r 10 -b
>
> I'm using an up-to-date lguest launcher.

I cannot reproduce this under qemu (that's how I run my 32-bit host
these days) with Linus' latest.  Of course, if the issue is that we have
a race and performance has increased to trigger it, it's not surprising.

I'm installing on an old laptop now (I don't have any native 32 bit
machines here).

Cheers,
Rusty.


More information about the Lguest mailing list