Commit 67a135b8 authored by Linus Torvalds's avatar Linus Torvalds
Browse files
Pull erofs updates from Gao Xiang:
 "There are some new features available for this cycle. Firstly, EROFS
  LZMA algorithm support, specifically called MicroLZMA, is available as
  an option for embedded devices, LiveCDs and/or as the secondary
  auxiliary compression algorithm besides the primary algorithm in one
  file.

  In order to better support the LZMA fixed-sized output compression,
  especially for 4KiB pcluster size (which has lowest memory pressure
  thus useful for memory-sensitive scenarios), Lasse introduced a new
  LZMA header/container format called MicroLZMA to minimize the original
  LZMA1 header (for example, we don't need to waste 4-byte dictionary
  size and another 8-byte uncompressed size, which can be calculated by
  fs directly, for each pcluster) and enable EROFS fixed-sized output
  compression.

  Note that MicroLZMA can also be later used by other things in addition
  to EROFS too where wasting minimal amount of space for headers is
  important and it can be only compiled by enabling XZ_DEC_MICROLZMA.
  MicroLZMA has been supported by the latest upstream XZ embedded [1] &
  XZ utils [2], apply the latest related XZ embedded upstream patches by
  the XZ author Lasse here.

  Secondly, multiple device is also supported in this cycle, which is
  designed for multi-layer container images. By working together with
  inter-layer data deduplication and compression, we can achieve the
  next high-performance container image solution. Our team will announce
  the new Nydus container image service [3] implementation with new RAFS
  v6 (EROFS-compatible) format in Open Source Summit 2021 China [4]
  soon.

  Besides, the secondary compression head support and readmore
  decompression strategy are also included in this cycle. There are also
  some minor bugfixes and cleanups, as always.

  Summary:

   - support multiple devices for multi-layer container images;

   - support the secondary compression head;

   - support readmore decompression strategy;

   - support new LZMA algorithm (specifically called MicroLZMA);

   - some bugfixes & cleanups"

* tag 'erofs-for-5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: don't trigger WARN() when decompression fails
  erofs: get rid of ->lru usage
  erofs: lzma compression support
  erofs: rename some generic methods in decompressor
  lib/xz, lib/decompress_unxz.c: Fix spelling in comments
  lib/xz: Add MicroLZMA decoder
  lib/xz: Move s->lzma.len = 0 initialization to lzma_reset()
  lib/xz: Validate the value before assigning it to an enum variable
  lib/xz: Avoid overlapping memcpy() with invalid input with in-place decompression
  erofs: introduce readmore decompression strategy
  erofs: introduce the secondary compression head
  erofs: get compression algorithms directly on mapping
  erofs: add multiple device support
  erofs: decouple basic mount options from fs_context
  erofs: remove the fast path of per-CPU buffer decompression
parents cd3e8ea8 a0961f35
Loading
Loading
Loading
Loading
+8 −4
Original line number Original line Diff line number Diff line
@@ -19,9 +19,10 @@ It is designed as a better filesystem solution for the following scenarios:
   immutable and bit-for-bit identical to the official golden image for
   immutable and bit-for-bit identical to the official golden image for
   their releases due to security and other considerations and
   their releases due to security and other considerations and


 - hope to save some extra storage space with guaranteed end-to-end performance
 - hope to minimize extra storage space with guaranteed end-to-end performance
   by using reduced metadata and transparent file compression, especially
   by using compact layout, transparent file compression and direct access,
   for those embedded devices with limited memory (ex, smartphone);
   especially for those embedded devices with limited memory and high-density
   hosts with numerous containers;


Here is the main features of EROFS:
Here is the main features of EROFS:


@@ -51,7 +52,9 @@ Here is the main features of EROFS:
 - Support POSIX.1e ACLs by using xattrs;
 - Support POSIX.1e ACLs by using xattrs;


 - Support transparent data compression as an option:
 - Support transparent data compression as an option:
   LZ4 algorithm with the fixed-sized output compression for high performance.
   LZ4 algorithm with the fixed-sized output compression for high performance;

 - Multiple device support for multi-layer container images.


The following git tree provides the file system user-space tools under
The following git tree provides the file system user-space tools under
development (ex, formatting tool mkfs.erofs):
development (ex, formatting tool mkfs.erofs):
@@ -87,6 +90,7 @@ cache_strategy=%s Select a strategy for cached decompression from now on:
dax={always,never}     Use direct access (no page cache).  See
dax={always,never}     Use direct access (no page cache).  See
                       Documentation/filesystems/dax.rst.
                       Documentation/filesystems/dax.rst.
dax                    A legacy option which is an alias for ``dax=always``.
dax                    A legacy option which is an alias for ``dax=always``.
device=%s              Specify a path to an extra device to be used together.
===================    =========================================================
===================    =========================================================


On-disk details
On-disk details
+31 −9
Original line number Original line Diff line number Diff line
@@ -6,16 +6,22 @@ config EROFS_FS
	select FS_IOMAP
	select FS_IOMAP
	select LIBCRC32C
	select LIBCRC32C
	help
	help
	  EROFS (Enhanced Read-Only File System) is a lightweight
	  EROFS (Enhanced Read-Only File System) is a lightweight read-only
	  read-only file system with modern designs (eg. page-sized
	  file system with modern designs (e.g. no buffer heads, inline
	  blocks, inline xattrs/data, etc.) for scenarios which need
	  xattrs/data, chunk-based deduplication, multiple devices, etc.) for
	  high-performance read-only requirements, e.g. Android OS
	  scenarios which need high-performance read-only solutions, e.g.
	  for mobile phones and LIVECDs.
	  smartphones with Android OS, LiveCDs and high-density hosts with
	  numerous containers;


	  It also provides fixed-sized output compression support,
	  It also provides fixed-sized output compression support in order to
	  which improves storage density, keeps relatively higher
	  improve storage density as well as keep relatively higher compression
	  compression ratios, which is more useful to achieve high
	  ratios and implements in-place decompression to reuse the file page
	  performance for embedded devices with limited memory.
	  for compressed data temporarily with proper strategies, which is
	  quite useful to ensure guaranteed end-to-end runtime decompression
	  performance under extremely memory pressure without extra cost.

	  See the documentation at <file:Documentation/filesystems/erofs.rst>
	  for more details.


	  If unsure, say N.
	  If unsure, say N.


@@ -76,3 +82,19 @@ config EROFS_FS_ZIP
	  Enable fixed-sized output compression for EROFS.
	  Enable fixed-sized output compression for EROFS.


	  If you don't want to enable compression feature, say N.
	  If you don't want to enable compression feature, say N.

config EROFS_FS_ZIP_LZMA
	bool "EROFS LZMA compressed data support"
	depends on EROFS_FS_ZIP
	select XZ_DEC
	select XZ_DEC_MICROLZMA
	help
	  Saying Y here includes support for reading EROFS file systems
	  containing LZMA compressed data, specifically called microLZMA. it
	  gives better compression ratios than the LZ4 algorithm, at the
	  expense of more CPU overhead.

	  LZMA support is an experimental feature for now and so most file
	  systems will be readable without selecting this option.

	  If unsure, say N.
+1 −0
Original line number Original line Diff line number Diff line
@@ -4,3 +4,4 @@ obj-$(CONFIG_EROFS_FS) += erofs.o
erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o
erofs-objs := super.o inode.o data.o namei.o dir.o utils.o pcpubuf.o
erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
erofs-$(CONFIG_EROFS_FS_XATTR) += xattr.o
erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
erofs-$(CONFIG_EROFS_FS_ZIP) += decompressor.o zmap.o zdata.o
erofs-$(CONFIG_EROFS_FS_ZIP_LZMA) += decompressor_lzma.o
+19 −9
Original line number Original line Diff line number Diff line
@@ -8,11 +8,6 @@


#include "internal.h"
#include "internal.h"


enum {
	Z_EROFS_COMPRESSION_SHIFTED = Z_EROFS_COMPRESSION_MAX,
	Z_EROFS_COMPRESSION_RUNTIME_MAX
};

struct z_erofs_decompress_req {
struct z_erofs_decompress_req {
	struct super_block *sb;
	struct super_block *sb;
	struct page **in, **out;
	struct page **in, **out;
@@ -25,6 +20,12 @@ struct z_erofs_decompress_req {
	bool inplace_io, partial_decoding;
	bool inplace_io, partial_decoding;
};
};


struct z_erofs_decompressor {
	int (*decompress)(struct z_erofs_decompress_req *rq,
			  struct page **pagepool);
	char *name;
};

/* some special page->private (unsigned long, see below) */
/* some special page->private (unsigned long, see below) */
#define Z_EROFS_SHORTLIVED_PAGE		(-1UL << 2)
#define Z_EROFS_SHORTLIVED_PAGE		(-1UL << 2)
#define Z_EROFS_PREALLOCATED_PAGE	(-2UL << 2)
#define Z_EROFS_PREALLOCATED_PAGE	(-2UL << 2)
@@ -63,7 +64,7 @@ static inline bool z_erofs_is_shortlived_page(struct page *page)
	return true;
	return true;
}
}


static inline bool z_erofs_put_shortlivedpage(struct list_head *pagepool,
static inline bool z_erofs_put_shortlivedpage(struct page **pagepool,
					      struct page *page)
					      struct page *page)
{
{
	if (!z_erofs_is_shortlived_page(page))
	if (!z_erofs_is_shortlived_page(page))
@@ -74,13 +75,22 @@ static inline bool z_erofs_put_shortlivedpage(struct list_head *pagepool,
		put_page(page);
		put_page(page);
	} else {
	} else {
		/* follow the pcluster rule above. */
		/* follow the pcluster rule above. */
		set_page_private(page, 0);
		erofs_pagepool_add(pagepool, page);
		list_add(&page->lru, pagepool);
	}
	}
	return true;
	return true;
}
}


#define MNGD_MAPPING(sbi)	((sbi)->managed_cache->i_mapping)
static inline bool erofs_page_is_managed(const struct erofs_sb_info *sbi,
					 struct page *page)
{
	return page->mapping == MNGD_MAPPING(sbi);
}

int z_erofs_decompress(struct z_erofs_decompress_req *rq,
int z_erofs_decompress(struct z_erofs_decompress_req *rq,
		       struct list_head *pagepool);
		       struct page **pagepool);


/* prototypes for specific algorithms */
int z_erofs_lzma_decompress(struct z_erofs_decompress_req *rq,
			    struct page **pagepool);
#endif
#endif
+60 −13
Original line number Original line Diff line number Diff line
@@ -89,6 +89,7 @@ static int erofs_map_blocks(struct inode *inode,
	erofs_off_t pos;
	erofs_off_t pos;
	int err = 0;
	int err = 0;


	map->m_deviceid = 0;
	if (map->m_la >= inode->i_size) {
	if (map->m_la >= inode->i_size) {
		/* leave out-of-bound access unmapped */
		/* leave out-of-bound access unmapped */
		map->m_flags = 0;
		map->m_flags = 0;
@@ -135,14 +136,8 @@ static int erofs_map_blocks(struct inode *inode,
		map->m_flags = 0;
		map->m_flags = 0;
		break;
		break;
	default:
	default:
		/* only one device is supported for now */
		map->m_deviceid = le16_to_cpu(idx->device_id) &
		if (idx->device_id) {
			EROFS_SB(sb)->device_id_mask;
			erofs_err(sb, "invalid device id %u @ %llu for nid %llu",
				  le16_to_cpu(idx->device_id),
				  chunknr, vi->nid);
			err = -EFSCORRUPTED;
			goto out_unlock;
		}
		map->m_pa = blknr_to_addr(le32_to_cpu(idx->blkaddr));
		map->m_pa = blknr_to_addr(le32_to_cpu(idx->blkaddr));
		map->m_flags = EROFS_MAP_MAPPED;
		map->m_flags = EROFS_MAP_MAPPED;
		break;
		break;
@@ -155,11 +150,55 @@ static int erofs_map_blocks(struct inode *inode,
	return err;
	return err;
}
}


int erofs_map_dev(struct super_block *sb, struct erofs_map_dev *map)
{
	struct erofs_dev_context *devs = EROFS_SB(sb)->devs;
	struct erofs_device_info *dif;
	int id;

	/* primary device by default */
	map->m_bdev = sb->s_bdev;
	map->m_daxdev = EROFS_SB(sb)->dax_dev;

	if (map->m_deviceid) {
		down_read(&devs->rwsem);
		dif = idr_find(&devs->tree, map->m_deviceid - 1);
		if (!dif) {
			up_read(&devs->rwsem);
			return -ENODEV;
		}
		map->m_bdev = dif->bdev;
		map->m_daxdev = dif->dax_dev;
		up_read(&devs->rwsem);
	} else if (devs->extra_devices) {
		down_read(&devs->rwsem);
		idr_for_each_entry(&devs->tree, dif, id) {
			erofs_off_t startoff, length;

			if (!dif->mapped_blkaddr)
				continue;
			startoff = blknr_to_addr(dif->mapped_blkaddr);
			length = blknr_to_addr(dif->blocks);

			if (map->m_pa >= startoff &&
			    map->m_pa < startoff + length) {
				map->m_pa -= startoff;
				map->m_bdev = dif->bdev;
				map->m_daxdev = dif->dax_dev;
				break;
			}
		}
		up_read(&devs->rwsem);
	}
	return 0;
}

static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
		unsigned int flags, struct iomap *iomap, struct iomap *srcmap)
{
{
	int ret;
	int ret;
	struct erofs_map_blocks map;
	struct erofs_map_blocks map;
	struct erofs_map_dev mdev;


	map.m_la = offset;
	map.m_la = offset;
	map.m_llen = length;
	map.m_llen = length;
@@ -168,8 +207,16 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,
	if (ret < 0)
	if (ret < 0)
		return ret;
		return ret;


	iomap->bdev = inode->i_sb->s_bdev;
	mdev = (struct erofs_map_dev) {
	iomap->dax_dev = EROFS_I_SB(inode)->dax_dev;
		.m_deviceid = map.m_deviceid,
		.m_pa = map.m_pa,
	};
	ret = erofs_map_dev(inode->i_sb, &mdev);
	if (ret)
		return ret;

	iomap->bdev = mdev.m_bdev;
	iomap->dax_dev = mdev.m_daxdev;
	iomap->offset = map.m_la;
	iomap->offset = map.m_la;
	iomap->length = map.m_llen;
	iomap->length = map.m_llen;
	iomap->flags = 0;
	iomap->flags = 0;
@@ -188,15 +235,15 @@ static int erofs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,


		iomap->type = IOMAP_INLINE;
		iomap->type = IOMAP_INLINE;
		ipage = erofs_get_meta_page(inode->i_sb,
		ipage = erofs_get_meta_page(inode->i_sb,
					    erofs_blknr(map.m_pa));
					    erofs_blknr(mdev.m_pa));
		if (IS_ERR(ipage))
		if (IS_ERR(ipage))
			return PTR_ERR(ipage);
			return PTR_ERR(ipage);
		iomap->inline_data = page_address(ipage) +
		iomap->inline_data = page_address(ipage) +
					erofs_blkoff(map.m_pa);
					erofs_blkoff(mdev.m_pa);
		iomap->private = ipage;
		iomap->private = ipage;
	} else {
	} else {
		iomap->type = IOMAP_MAPPED;
		iomap->type = IOMAP_MAPPED;
		iomap->addr = map.m_pa;
		iomap->addr = mdev.m_pa;
	}
	}
	return 0;
	return 0;
}
}
Loading