[PATCH v8 07/20] cachefiles: document on-demand read mode

Mon Apr 11 23:38:51 AEST 2022

Jeffle Xu <jefflexu at linux.alibaba.com> wrote:

> + (*) On-demand Read.
> +

Unnecessary extra blank line.

Jeffle Xu <jefflexu at linux.alibaba.com> wrote:

> +
> +
> +On-demand Read
> +==============
> +
> +When working in original mode, cachefiles mainly serves as a local cache for
> +remote networking fs, while in on-demand read mode, cachefiles can boost the
> +scenario where on-demand read semantics is needed, e.g. container image
> +distribution.
> +
> +The essential difference between these two modes is that, in original mode,
> +when cache miss, netfs itself will fetch data from remote, and then write the
> +fetched data into cache file. While in on-demand read mode, a user daemon is
> +responsible for fetching data and then writing to the cache file.
> +
> +``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.

You're missing a few articles there.  How about:

"""
When working in its original mode, cachefiles mainly serves as a local cache
for a remote networking fs - while in on-demand read mode, cachefiles can boost
the scenario where on-demand read semantics are needed, e.g. container image
distribution.

The essential difference between these two modes is that, in original mode,
when a cache miss occurs, the netfs will fetch the data from the remote server
and then write it to the cache file.  With on-demand read mode, however,
fetching and the data and writing it into the cache is delegated to a user
daemon.

``CONFIG_CACHEFILES_ONDEMAND`` shall be enabled to support on-demand read mode.
"""

"should be enabled".

Also, two spaces after a full stop please (but not after the dot in a
contraction, e.g. "e.g.").

> +The on-demand read mode relies on a simple protocol used for communication
> +between kernel and user daemon. The model is like::

"The protocol can be modelled as"?

> +The cachefiles kernel module will send requests to

the

> user daemon when needed.
> +

the

> User daemon needs to poll on the devnode ('/dev/cachefiles') to check if
> +there's

a

> pending request to be processed.  A POLLIN event will be returned
> +when there's

a

> pending request.

> +Then user daemon needs to read

"The user daemon [than] reads "

> the devnode to fetch one

one -> a

> request and process it
> +accordingly. It is worth nothing

nothing -> noting

> that each read only gets one request. When
> +finished processing the request,

the

> user daemon needs to write the reply to the
> +devnode.

> +Each request is started with a message header like::

"is started with" -> "starts with".
"like" -> "of the form".

> +	* ``id`` is a unique ID identifying this request among all pending
> +	  requests.

What's the scope of the uniqueness of "id"?  Is it just unique to a particular
cachefiles cache?

> +	* ``len`` identifies the whole length of this request, including the
> +	  header and following type specific payload.

type-specific.

> +An optional parameter is added to "bind" command::

to the "bind" command.

> +When

the

> "bind" command takes

takes -> is given

> without argument, it defaults to the original mode.
> +When

the

> "bind" command takes

is given

> with

the

> "ondemand" argument, i.e. "bind ondemand",
> +on-demand read mode will be enabled.

> +OPEN Request

The

> +------------
> +
> +When

the

> netfs opens a cache file for the first time, a request with

the

> +CACHEFILES_OP_OPEN opcode, a.k.a

an

> OPEN request will be sent to

the

> user daemon. The
> +payload format is like::

format is like -> of the form

> +
> +	struct cachefiles_open {
> +		__u32 volume_key_size;
> +		__u32 cookie_key_size;
> +		__u32 fd;
> +		__u32 flags;
> +		__u8  data[];
> +	};
> +

"where:"

> +	* ``data`` contains

the

> volume_key and cookie_key in sequence.

Might be better to say "contains the volume_key followed directly by the
cookie_key.  The volume key is a NUL-terminated string; cookie_key is binary
data.".

> +
> +	* ``volume_key_size`` identifies

identifies -> indicates/supplies

> the size of

the

> volume key of the cache
> +	  file, in bytes. volume_key is of string format, with a suffix '\0'.
> +
> +	* ``cookie_key_size`` identifies the size of cookie key of the cache
> +	  file, in bytes. cookie_key is of binary format, which is netfs
> +	  specific.

"... indicates the size of the cookie key in bytes."

> +
> +	* ``fd`` identifies the

the -> an

> anonymous fd of

of -> referring to

> the cache file, with

with -> through

> which user
> +	  daemon can perform write/llseek file operations on the cache file.
> +
> +
> +

The

> OPEN request contains

a

> (volume_key, cookie_key, anon_fd) triple for

triplet for the

I would probably also use {...} rather than (...).

> corresponding
> +cache file. With this triple,

triplet, the

> user daemon could

could -> can

> fetch and write data into the
> +cache file in the background, even when kernel has not triggered the

the -> a

> cache miss
> +yet.

The

> User daemon is able to distinguish the requested cache file with the given
> +(volume_key, cookie_key), and write the fetched data into

the

> cache file with

with -> using

> the
> +given anon_fd.
> +
> +After recording the (volume_key, cookie_key, anon_fd) triple,

triplet, the

> user daemon shall

shall -> should

> +reply with

reply with -> complete the request by issuing a

> "copen" (complete open) command::
> +
> +	copen <id>,<cache_size>
> +
> +	* ``id`` is exactly the id field of the previous OPEN request.
> +
> +	* When >= 0, ``cache_size`` identifies the size of the cache file;
> +	  when < 0, ``cache_size`` identifies the error code ecountered by the
> +	  user daemon.

identifies -> indicates
ecountered -> encountered

> +CLOSE Request

The

> +-------------
> +When

a

> cookie withdrawed,

withdrawed -> withdrawn

> a request with

a

> CACHEFILES_OP_CLOSE opcode, a.k.a CLOSE
> +request,

Maybe phrase as "... a close request (opcode CACHEFILES_OP_CLOSE),

> will be sent to user daemon. It will notify

the

> user daemon to close the
> +attached anon_fd. The payload format is like::

like -> of the form

> +
> +	struct cachefiles_close {
> +		__u32 fd;
> +	};
> +

"where:"

> +	* ``fd`` identifies the anon_fd to be closed, which is exactly the same

"... which should be the same as that provided to the OPEN request".

Is it possible for userspace to move the fd around with dup() or whatever?

> +	  with that in OPEN request.
> +
> +
> +READ Request

The

> +------------
> +
> +When on-demand read mode is turned on, and

a

> cache miss encountered,

the

> kernel will
> +send a request with CACHEFILES_OP_READ opcode, a.k.a READ request,

"send a READ request (opcode CACHEFILES_OP_READ)"

> to

the

> user
> +daemon. It will notify

It will notify -> This will ask/tell

> user daemon to fetch data in the requested file range.
> +The payload format is like::

format is like -> is of the form

> +
> +	struct cachefiles_read {
> +		__u64 off;
> +		__u64 len;
> +		__u32 fd;
> +	};
> +
> +	* ``off`` identifies the starting offset of the requested file range.

identifies -> indicates

> +
> +	* ``len`` identifies the length of the requested file range.
> +

identifies -> indicates (you could alternatively say "specified")

> +	* ``fd`` identifies the anonymous fd of the requested cache file. It is
> +	  guaranteed that it shall be the same with

"same with" -> "same as"

Since the kernel cannot make such a guarantee, I think you may need to restate
this as something like "Userspace must present the same fd as was given in the
previous OPEN request".

> the fd field in the previous
> +	  OPEN request.
> +
> +When receiving one

one -> a

> READ request,

the

> user daemon needs to fetch

the

> data of the
> +requested file range, and then write the fetched data

, and then write the fetched data -> and write it

> into cache file

cache file -> cache

> with

using

> the
> +given anonymous fd.

+ to indicate the destination.

> +
> +When finished

When finished -> To finish

> processing the READ request,

the

> user daemon needs to reply with

the

> +CACHEFILES_IOC_CREAD ioctl on the corresponding anon_fd::
> +
> +	ioctl(fd, CACHEFILES_IOC_CREAD, id);
> +
> +	* ``fd`` is exactly the fd field of the previous READ request.

Does that have to be true?  What if userspace moves it somewhere else?

> +
> +	* ``id`` is exactly the id field of the previous READ request.

is exactly the -> must match the

David