[PATCH V10 09/19] block: introduce bio_bvecs()

Sagi Grimberg sagi at grimberg.me
Wed Nov 21 15:25:46 AEDT 2018


>> I would like to avoid growing bvec tables and keep everything
>> preallocated. Plus, a bvec_iter operates on a bvec which means
>> we'll need a table there as well... Not liking it so far...
> 
> In case of bios in one request, we can't know how many bvecs there
> are except for calling rq_bvecs(), so it may not be suitable to
> preallocate the table. If you have to send the IO request in one send(),
> runtime allocation may be inevitable.

I don't want to do that, I want to work on a single bvec at a time like
the current implementation does.

> If you don't require to send the IO request in one send(), you may send
> one bio in one time, and just uses the bio's bvec table directly,
> such as the single bio case in lo_rw_aio().

we'd need some indication that we need to reinit my iter with the
new bvec, today we do:

static inline void nvme_tcp_advance_req(struct nvme_tcp_request *req,
                 int len)
{
         req->snd.data_sent += len;
         req->pdu_sent += len;
         iov_iter_advance(&req->snd.iter, len);
         if (!iov_iter_count(&req->snd.iter) &&
             req->snd.data_sent < req->data_len) {
                 req->snd.curr_bio = req->snd.curr_bio->bi_next;
                 nvme_tcp_init_send_iter(req);
         }
}

and initialize the send iter. I imagine that now I will need to
switch to the next bvec and only if I'm on the last I need to
use the next bio...

Do you offer an API for that?


>>> can this way avoid your blocking issue? You may see this
>>> example in branch 'rq->bio != rq->biotail' of lo_rw_aio().
>>
>> This is exactly an example of not ignoring the bios...
> 
> Yeah, that is the most common example, given merge is enabled
> in most of cases. If the driver or device doesn't care merge,
> you can disable it and always get single bio request, then the
> bio's bvec table can be reused for send().

Does bvec_iter span bvecs with your patches? I didn't see that change?

>> I'm not sure how this helps me either. Unless we can set a bvec_iter to
>> span bvecs or have an abstract bio crossing when we re-initialize the
>> bvec_iter I don't see how I can ignore bios completely...
> 
> rq_for_each_bvec() will iterate over all bvecs from all bios, so you
> needn't to see any bio in this req.

But I don't need this iteration, I need a transparent API like;
bvec2 = rq_bvec_next(rq, bvec)

This way I can simply always reinit my iter without thinking about how
the request/bios/bvecs are constructed...

> rq_bvecs() will return how many bvecs there are in this request(cover
> all bios in this req)

Still not very useful given that I don't want to use a table...

>>> So looks nvme-tcp host driver might be the 2nd driver which benefits
>>> from multi-page bvec directly.
>>>
>>> The multi-page bvec V11 has passed my tests and addressed almost
>>> all the comments during review on V10. I removed bio_vecs() in V11,
>>> but it won't be big deal, we can introduce them anytime when there
>>> is the requirement.
>>
>> multipage-bvecs and nvme-tcp are going to conflict, so it would be good
>> to coordinate on this. I think that nvme-tcp host needs some adjustments
>> as setting a bvec_iter. I'm under the impression that the change is rather
>> small and self-contained, but I'm not sure I have the full
>> picture here.
> 
> I guess I may not get your exact requirement on block io iterator from nvme-tcp
> too, :-(

They are pretty much listed above. Today nvme-tcp sets an iterator with:

vec = __bvec_iter_bvec(bio->bi_io_vec, bio->bi_iter);
nsegs = bio_segments(bio);
size = bio->bi_iter.bi_size;
offset = bio->bi_iter.bi_bvec_done;
iov_iter_bvec(&req->snd.iter, WRITE, vec, nsegs, size);

and when done, iterate to the next bio and do the same.

With multipage bvec it would be great if we can simply have
something like rq_bvec_next() that would pretty much satisfy
the requirements from the nvme-tcp side...


More information about the Linux-erofs mailing list