[RFC PATCH] staging: erofs: add document

Chao Yu yuchao0 at huawei.com
Mon Jan 14 13:13:39 AEDT 2019


Hi Xiang,

Nice work!

Few trivial comments as below, anyway please add:

Reviewed-by: Chao Yu <yuchao0 at huawei.com>

On 2019/1/12 18:35, Gao Xiang wrote:
> This documents key feature, design, and usage of erofs.
> 
> Signed-off-by: Gao Xiang <hsiangkao at aol.com>
> ---
>  .../erofs/Documentation/filesystems/erofs.txt      | 160 +++++++++++++++++++++
>  1 file changed, 160 insertions(+)
>  create mode 100644 drivers/staging/erofs/Documentation/filesystems/erofs.txt
> 
> diff --git a/drivers/staging/erofs/Documentation/filesystems/erofs.txt b/drivers/staging/erofs/Documentation/filesystems/erofs.txt
> new file mode 100644
> index 000000000000..f1d6a9701caa
> --- /dev/null
> +++ b/drivers/staging/erofs/Documentation/filesystems/erofs.txt
> @@ -0,0 +1,160 @@
> +Overview
> +========
> +
> +EROFS file-system stands for Enhanced Read-Only File System. Different
> +from other read-only file systems, it aims to be designed for flexibility,
> +scalability, but be kept simple and high performance.
> +
> +Here is the main features of EROFS:
> + - Little endian on-disk design;
> +
> + - 4KB block size and therefore maximum 16TB address space;
> +
> + - Metadata and data could be mixed by design;
> +
> + - 2 inode versions for different requirements:
> +                          v1            v2
> +   Inode metadata size:   32 bytes      64 bytes
> +   Max file size:         4 GB          16 EB (limited by max. vol size)
> +   Max uids/gids:         65536         4294967296
> +   File creation time:    no            yes (64 + 32-bit timestamp)
> +   Max hard links:        65536         4294967296
> +   Metadata reserved:     4             14
> +
> + - Support extended attributes (xattrs)
> +
> + - Support xattr inline and tail-end data inline for all files;
> +
> + - Support transparent data compression as an option:
> +   LZ4 algorithm with 4 KB fixed-output compression for high performance;
> +
> +The following git tree provides the file system user-space tools under
> +development (ex, formatting tool mkfs.erofs):
> +>> git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git
> +
> +Bugs and patches are welcome, please help kindly us and send them to
> +the following mailing list:
> +>> linux-erofs mailing list   <linux-erofs at lists.ozlabs.org>
> +
> +Note that EROFS is still working in progress as a Linux staging driver,
> +Cc the staging mailing list is really recommended:
> +>> Linux Driver Project Developer List <devel at driverdev.osuosl.org>
> +
> +Mount options
> +=============
> +
> +fault_injection=%d     Enable fault injection in all supported types with
> +                       specified injection rate.

			Supported injection type:
			Type_Name                Type_Value
			FAULT_KMALLOC            0x000000001

> +(no)user_xattr         Setup Extended User Attributes. Note: xattr is enabled
> +                       by default if CONFIG_EROFS_FS_XATTR is selected.
> +(no)acl                Setup POSIX Access Control List. Note: acl is enabled
> +                       by default if CONFIG_EROFS_FS_POSIX_ACL is selected.
> +
> +On-disk details
> +===============
> +
> +Summary
> +-------
> +Different from other read-only file systems, an EROFS volume is designed
> +to be as simple as possible:
> +
> +                                |-> aligned with the block size
> +   ____________________________________________________________
> +  | |SB| | ... | Metadata | ... | Data | Metadata | ... | Data |
> +  |_|__|_|_____|__________|_____|______|__________|_____|______|
> +  0 +1K
> +
> +All data areas should be aligned with the block size, but metadata areas
> +may not. All metadatas can be now observed in two different spaces (views):
> + 1) Inode metadata space
> +    Each valid inode should be aligned with an inode slot, which is a fixed
> +    value (32 bytes) and designed to be kept in line with v1 inode size.
> +
> +    Each inode can be directly found with the following formula:
> +         inode offset = meta_blkaddr * block_size + 32 * nid
> +
> +                                |-> aligned with 8B
> +                                           |-> followed closely
> +    + meta_blkaddr blocks                                      |-> another slot
> +     _____________________________________________________________________
> +    |  ...   | inode |  xattrs  | extents  | data inline | ... | inode ...
> +    |________|_______|(optional)|(optional)|__(optional)_|_____|__________
> +             |-> aligned with the inode slot size
> +                  .                   .
> +                .                         .
> +              .                              .
> +            .                                    .
> +          .                                         .
> +        .                                              .
> +      .____________________________________________________|-> aligned with 4B
> +      | xattr_ibody_header | shared xattrs | inline xattrs |
> +      |____________________|_______________|_______________|
> +      |->    12 bytes    <-|->x * 4 bytes<-|               .
> +                          .                .                 .
> +                    .                      .                   .
> +               .                           .                     .
> +           ._______________________________.______________________.
> +           | id | id | id | id |  ... | id | ent | ... | ent| ... |
> +           |____|____|____|____|______|____|_____|_____|____|_____|
> +                                           |-> aligned with 4B
> +                                                       |-> aligned with 4B
> +
> +    Inode could be 32 or 64 bytes, which can be distinguished from a common
> +    field which all inode versions have -- i_advise:
> +
> +        __________________               __________________
> +       |     i_advise     |             |     i_advise     |
> +       |__________________|             |__________________|
> +       |        ...       |             |        ...       |
> +       |                  |             |                  |
> +       |__________________| 32 bytes    |                  |
> +                                        |                  |
> +                                        |__________________| 64 bytes
> +
> +    Xattrs, extents, data inline are followed by the corresponding inode with
> +    proper alignes, and they could be optional for different data mappings,
> +    currently there are totally 3 valid data mappings:
> +
> +     1) flat file data without data inline (no extent);
> +     2) fixed-output size data compression (must have extents);
> +     3) flat file data with tail-end data inline (no extent);
> +
> +    The size of the optional xattrs is indicated by i_xattr_count in inode
> +    header. Large xattrs or xattrs shared by many different files can be
> +    stored in shared xattrs metadata rather than inlined right after inode.
> +
> + 2) Shared xattrs metadata space
> +    Shared xattrs space is similar to the above inode space, started with
> +    a specific block indicated by xattr_blkaddr, organized one by one with
> +    proper align.
> +
> +    Each share xattr can be found by the following formula:
> +         xattr offset = xattr_blkaddr * block_size + 4 * xattr_id
> +
> +                           |-> aligned by  4 bytes
> +    + xattr_blkaddr blocks                     |-> aligned with 4 bytes
> +     _________________________________________________________________________
> +    |  ...   | xattr_entry |  xattr data | ... |  xattr_entry | xattr data  ...
> +    |________|_____________|_____________|_____|______________|_______________
> +
> +Directories
> +-----------
> +All directories are now organized in a compact on-disk format. Note that
> +each directory block is divided into index and name areas in order to
> +support random file lookup, and all directory entries are strictly written
> +in alphabetical order in order to support improved prefix binary search
> +algorithm.
> +
> +
> +                 +--------------------------+
> +                /                           |
> +               /              +-------------+----------------+
> +              /              /             \|/namelen1      \|/ namelenN-1

						|		|
						v		v

> + ____________+______________+___________________________________________
> +| dirent | dirent | ... | dirent | filename | filename | ... | filename |
> +|____0___|____1___|_____|___N-1__|____0_____|____1_____|_____|___N-1____|
> +     \                          /|\                           * could have

				    ^
				    |

> +      \                          |                              trailing '\0'
> +       \                         |
> +        +------------------------+ namelen0
> +
> 



More information about the Linux-erofs mailing list