- Dec 05, 2019
-
-
Justin Tee authored
7c20f116 ("bio-integrity: stop abusing bi_end_io") moves bio_integrity_free from bio_uninit() to bio_integrity_verify_fn() and bio_endio(). This way looks wrong because bio may be freed without calling bio_endio(), for example, blk_rq_unprep_clone() is called from dm_mq_queue_rq() when the underlying queue of dm-mpath is busy. So memory leak of bio integrity data is caused by commit 7c20f116. Fixes this issue by re-adding bio_integrity_free() to bio_uninit(). Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io") Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by Justin Tee <justin.tee@broadcom.com> Add commit log, and simplify/fix the original patch wroten by Justin. Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Hou Tao authored
bio->bi_blkg will be NULL when the issue of the request has bypassed the block layer as shown in the following oops: Internal error: Oops: 96000005 [#1] SMP CPU: 17 PID: 2996 Comm: scsi_id Not tainted 5.4.0 #4 Call trace: percpu_counter_add_batch+0x38/0x4c8 bfqg_stats_update_legacy_io+0x9c/0x280 bfq_insert_requests+0xbac/0x2190 blk_mq_sched_insert_request+0x288/0x670 blk_execute_rq_nowait+0x140/0x178 blk_execute_rq+0x8c/0x140 sg_io+0x604/0x9c0 scsi_cmd_ioctl+0xe38/0x10a8 scsi_cmd_blk_ioctl+0xac/0xe8 sd_ioctl+0xe4/0x238 blkdev_ioctl+0x590/0x20e0 block_ioctl+0x60/0x98 do_vfs_ioctl+0xe0/0x1b58 ksys_ioctl+0x80/0xd8 __arm64_sys_ioctl+0x40/0x78 el0_svc_handler+0xc4/0x270 so ensure its validity before using it. Fixes: fd41e603 ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by:
Hou Tao <houtao1@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 03, 2019
-
-
Christoph Hellwig authored
The current zone revalidation code has a major problem in that it doesn't update the zone size and q->nr_zones atomically, leading to a short window where an out of bounds access to the zone arrays is possible. To fix this move the setting of the zone size into the crticial sections blk_revalidate_disk_zones so that it gets updated together with the zone bitmaps and q->nr_zones. This also slightly simplifies the caller as it deducts the zone size from the report_zones. This change also allows to check for a power of two zone size in generic code. Reported-by:
Hans Holmberg <hans@owltronix.com> Reviewed-by:
Javier González <javier@javigon.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
bio based drivers only need to update q->nr_zones. Do that manually instead of overloading blk_revalidate_disk_zones to keep that function simpler for the next round of changes that will rely even more on the request based functionality. Reviewed-by:
Javier González <javier@javigon.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Allocate the conventional zone bitmap and the sequential zone locking bitmap only when we find a zone of the respective type. This avoids wasting memory on the conventional zone bitmap for devices that only have sequential zones, and will also prepare for other future changes. Reviewed-by:
Javier González <javier@javigon.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Invert the meaning of seq_zones_bitmap by keeping a bitmap of conventional zones. This allows not having a bitmap for devices that do not have conventional zones. Reviewed-by:
Javier González <javier@javigon.com> Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Simplify the arguments to blkdev_nr_zones by passing a gendisk instead of the block_device and capacity. This also removes the need for __blkdev_nr_zones as all callers are outside the fast path and can deal with the additional branch. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 21, 2019
-
-
Jens Axboe authored
We really don't need this, as the slow path will do the right thing anyway. This reverts commit 6952a7f8. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Konstantin Khlebnikov authored
Requests that triggers flushing volatile writeback cache to disk (barriers) have significant effect to overall performance. Block layer has sophisticated engine for combining several flush requests into one. But there is no statistics for actual flushes executed by disk. Requests which trigger flushes usually are barriers - zero-size writes. This patch adds two iostat counters into /sys/class/block/$dev/stat and /proc/diskstats - count of completed flush requests and their total time. Signed-off-by:
Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 20, 2019
-
-
Dmitry Monakhov authored
In most cases blk_tracing is not active, but bfq_log_bfqq macro generate pid_str unconditionally, which result in significant overhead. ## Test modprobe null_blk echo bfq > /sys/block/nullb0/queue/scheduler fio --name=t --ioengine=libaio --direct=1 --filename=/dev/nullb0 \ --runtime=30 --time_based=1 --rw=write --iodepth=128 --bs=4k # Results | | baseline | w/ patch | gain | | iops | 113.19K | 126.42K | +11% | Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Dmitry Monakhov <dmonakhov@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 18, 2019
-
-
Revanth Rajashekar authored
In function 'activate_lsp', rather than hard-coding the short atom header(0x83), we need to let the function 'add_short_atom_header' append the header based on the parameter being appended. The parameter has been defined in Section 3.1.2.1 of https://trustedcomputinggroup.org/wp-content/uploads/TCG_Storage-Opal_Feature_Set_Single_User_Mode_v1-00_r1-00-Final.pdf Reviewed-by:
Jon Derrick <jonathan.derrick@intel.com> Signed-off-by:
Revanth Rajashekar <revanth.rajashekar@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Sebastian Andrzej Siewior authored
trigger_softirq() is always invoked as a SMP-function call which is always invoked with disables interrupts. Don't disable interrupt in trigger_softirq() because interrupts are already disabled. Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 14, 2019
-
-
Jiufei Xue authored
There is a bug that checking the same active_list over and over again in iocg_activate(). The intention of the code was checking whether all the ancestors and self have already been activated. So fix it. Fixes: 7caa4715 ("blkcg: implement blk-iocost") Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
In general drivers should never mess with partition tables directly. Unfortunately s390 and loop do for somewhat historic reasons, but they can use bdev_disk_changed directly instead when we export it as they satisfy the sanity checks we have in __blkdev_reread_part. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by: Stefan Haberland <sth@linux.ibm.com> [dasd] Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
We still have to set the capacity to 0 if invalidating or call revalidate_disk if not even if the disk has no partitions. Fix that by merging rescan_partitions into bdev_disk_changed and just stubbing out blk_add_partitions and blk_drop_partitions for non-partitioned devices. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Large parts of rescan_partitions aren't about partitions, and moving it to block_dev.c will allow for some further cleanups by merging it into its only caller. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
A lot of the logic in invalidate_partitions and rescan_partitions is shared. Merge the two functions to simplify things. There is a small behavior change in that we now send the kevent change notice also if we were not invalidating but no partitions were found, which seems like the right thing to do. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Split out a helper that adds one single partition, and another one calling that dealing with the parsed_partitions state. This makes it much more obvious how we clean up all state and start again when using the rescan label. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
Since commit 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging"), to prevent the service guarantees of a bfq_queue from being violated, the bfq_queue may be left busy, i.e., scheduled for service, even if empty (see comments in __bfq_bfqq_expire() for details). But, if no process will send requests to the bfq_queue any longer, then there is no point in keeping the bfq_queue scheduled for service. In addition, keeping the bfq_queue scheduled for service, but with no process reference any longer, may cause the bfq_queue to be freed when descheduled from service. But this is assumed to never happen, and causes a UAF if it happens. This, in turn, caused crashes [1, 2]. This commit fixes this issue by descheduling an empty bfq_queue when it remains with not process reference. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1767539 [2] https://bugzilla.kernel.org/show_bug.cgi?id=205447 Fixes: 3726112e ("block, bfq: re-schedule empty queues if they deserve I/O plugging") Reported-by:
Chris Evich <cevich@redhat.com> Reported-by:
Patrick Dung <patdung100@gmail.com> Reported-by:
Thorsten Schubert <tschubert@bafh.org> Tested-by:
Thorsten Schubert <tschubert@bafh.org> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 13, 2019
-
-
John Garry authored
These functions are not referenced, so delete them. Signed-off-by:
John Garry <john.garry@huawei.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
No known partitioning tool supports zoned block devices, especially the host managed flavor with strong sequential write constraints. Furthermore, there are also no known user nor use cases for partitioned zoned block devices. This patch removes partition device creation for zoned block devices, which allows simplifying the processing of zone commands for zoned block devices. A warning is added if a partition table is found on the device. For report zones operations no zone sector information remapping is necessary anymore, simplifying the code. Of note is that remapping of zone reports for DM targets is still necessary as done by dm_remap_zone_report(). Similarly, remaping of a zone reset bio is not necessary anymore. Testing for the applicability of the zone reset all request also becomes simpler and only needs to check that the number of sectors of the requested zone range is equal to the disk capacity. Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
All kernel users of blkdev_report_zones() as well as applications use through ioctl(BLKZONEREPORT) expect to potentially get less zone descriptors than requested. As such, the use of the internal report zones command execution loop implemented by blk_report_zones() is not necessary and can even be harmful to performance by causing the execution of inefficient small zones report command to service the reminder of a requested zone array. This patch removes blk_report_zones(), simplifying the code. Also remove a now incorrect comment in dm_blk_report_zones(). Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Javier Gonzalez <javier@javigon.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
blk_revalidate_disk_zones is never called for non-zoned devices. Just return early and warn instead of trying to handle this case. Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
For ZBC and ZAC zoned devices, the scsi driver revalidation processing implemented by sd_revalidate_disk() includes a call to sd_zbc_read_zones() which executes a full disk zone report used to check that all zones of the disk are the same size. This processing is followed by a call to blk_revalidate_disk_zones(), used to initialize the device request queue zone bitmaps (zone type and zone write lock bitmaps). To do so, blk_revalidate_disk_zones() also executes a full device zone report to obtain zone types. As a result, the entire zoned block device revalidation process includes two full device zone report. By moving the zone size checks into blk_revalidate_disk_zones(), this process can be optimized to a single full device zone report, leading to shorter device scan and revalidation times. This patch implements this optimization, reducing the original full device zone report implemented in sd_zbc_check_zones() to a single, small, report zones command execution to obtain the size of the first zone of the device. Checks whether all zones of the device are the same size as the first zone size are moved to the generic blk_check_zone() function called from blk_revalidate_disk_zones(). This optimization also has the following benefits: 1) fewer memory allocations in the scsi layer during disk revalidation as the potentailly large buffer for zone report execution is not needed. 2) Implement zone checks in a generic manner, reducing the burden on device driver which only need to obtain the zone size and check that this size is a power of 2 number of LBAs. Any new type of zoned block device will benefit from this. Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 12, 2019
-
-
Junichi Nomura authored
__bio_try_merge_page() may merge a page to bio without bio_full() check and cause bi_size overflow. The overflow typically ends up with sd_init_command() warning on zero segment request with call trace like this: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 1986 at drivers/scsi/scsi_lib.c:1025 scsi_init_io+0x156/0x180 CPU: 2 PID: 1986 Comm: kworker/2:1H Kdump: loaded Not tainted 5.4.0-rc7 #1 Workqueue: kblockd blk_mq_run_work_fn RIP: 0010:scsi_init_io+0x156/0x180 RSP: 0018:ffffa11487663bf0 EFLAGS: 00010246 RAX: 00000000002be0a0 RBX: ffff8e6e9ff30118 RCX: 0000000000000000 RDX: 00000000ffffffe1 RSI: 0000000000000000 RDI: ffff8e6e9ff30118 RBP: ffffa11487663c18 R08: ffffa11487663d28 R09: ffff8e6e9ff30150 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8e6e9ff30000 R13: 0000000000000001 R14: ffff8e74a1cf1800 R15: ffff8e6e9ff30000 FS: 0000000000000000(0000) GS:ffff8e6ea7680000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff18cf0fe8 CR3: 0000000659f0a001 CR4: 00000000001606e0 Call Trace: sd_init_command+0x326/0xb40 [sd_mod] scsi_queue_rq+0x502/0xaa0 ? blk_mq_get_driver_tag+0xe7/0x120 blk_mq_dispatch_rq_list+0x256/0x5a0 ? elv_rb_del+0x24/0x30 ? deadline_remove_request+0x7b/0xc0 blk_mq_do_dispatch_sched+0xa3/0x140 blk_mq_sched_dispatch_requests+0xfb/0x170 __blk_mq_run_hw_queue+0x81/0x130 blk_mq_run_work_fn+0x1b/0x20 process_one_work+0x179/0x390 worker_thread+0x4f/0x3e0 kthread+0x105/0x140 ? max_active_store+0x80/0x80 ? kthread_bind+0x20/0x20 ret_from_fork+0x35/0x40 ---[ end trace f9036abf5af4a4d3 ]--- blk_update_request: I/O error, dev sdd, sector 2875552 op 0x1:(WRITE) flags 0x0 phys_seg 0 prio class 0 XFS (sdd1): writeback error on sector 2875552 __bio_try_merge_page() should check the overflow before actually doing merge. Fixes: 07173c3e ("block: enable multipage bvecs") Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Hannes Reinecke <hare@suse.de> Signed-off-by:
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 08, 2019
-
-
Ming Lei authored
64K PAGE_SIZE is popular on ARM64 or other ARCHs, and 64K has been big enough to break some devices probably, so change the logic to split bio if the only bvec's length is > SZ_4K instead of PAGE_SIZE. Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations) Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ming Lei authored
Some device may set segment boundary as PAGE_SIZE - 1. If the bvec crosses pages, and meantime its length is <= PAGE_SIZE, we still need to split the bvec into 2 segments. Fixes this issue by still splitting bio if the single bvec crosses pages. Reported-by:
kernel test robot <lkp@intel.com> Fixes: fa532287 (block: avoid blk_bio_segment_split for small I/O operations) Cc: Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Nov 07, 2019
-
-
Tejun Heo authored
blkg_rwstat is now only used by bfq-iosched and blk-throtl when on cgroup1. Let's move it into its own files and gate it behind a config option. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
blk-cgroup has been using blkg_rwstat to track basic IO stats. Unfortunately, reading recursive stats scales badly as itinvolves walking all descendants. On systems with a huge number of cgroups (dead or alive), this can lead to substantial CPU cost when reading IO stats. This patch reimplements basic IO stats using cgroup rstat which uses more memory but makes recursive stat reading O(# descendants which have been active since last reading) instead of O(# descendants). * blk-cgroup core no longer uses sync/async stats. Introduce new stat enums - BLKG_IOSTAT_{READ|WRITE|DISCARD}. * Add blkg_iostat[_set] which encapsulates byte and io stats, last values for propagation delta calculation and u64_stats_sync for correctness on 32bit archs. * Update the new percpu stat counters directly and implement blkcg_rstat_flush() to implement propagation. * blkg_print_stat() can now bring the stats up to date by calling cgroup_rstat_flush() and print them instead of directly summing up all descendants. * It now allocates 96 bytes per cpu. It used to be 40 bytes. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Dan Schatzberg <dschatzberg@fb.com> Cc: Daniel Xu <dlxu@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
These don't have users anymore. Remove them. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
When used on cgroup1, blk-throtl uses the blkg->stat_bytes and ->stat_ios from blk-cgroup core to populate four stat knobs. blk-cgroup core is moving away from blkg_rwstat to improve scalability and won't be able to support this usage. It isn't like the sharing gains all that much. Let's break them out to dedicated rwstat counters which are updated when on cgroup1. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
When used on cgroup1, bfq uses the blkg->stat_bytes and ->stat_ios from blk-cgroup core to populate six stat knobs. blk-cgroup core is moving away from blkg_rwstat to improve scalability and won't be able to support this usage. It isn't like the sharing gains all that much. Let's break it out to dedicated rwstat counters which are updated when on cgroup1. This makes use of bfqg_*rwstat*() helpers outside of CONFIG_BFQ_CGROUP_DEBUG. Move them out. v2: Compile fix when !CONFIG_BFQ_CGROUP_DEBUG. Signed-off-by:
Tejun Heo <tj@kernel.org> Cc: Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
Collect them right under #ifdef CONFIG_BFQ_CGROUP_DEBUG. The next patch will use them from !DEBUG path and this makes it easy to move them out of the ifdef block. This is pure code reorganization. Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ajay Joshi authored
Introduce three new ioctl commands BLKOPENZONE, BLKCLOSEZONE and BLKFINISHZONE to allow applications to control the condition of zones on a zoned block device through the execution of the REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH operations. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by:
Javier González <javier@javigon.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by:
Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by:
Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by:
Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ajay Joshi authored
Zoned block devices (ZBC and ZAC devices) allow an explicit control over the condition (state) of zones. The operations allowed are: * Open a zone: Transition to open condition to indicate that a zone will actively be written * Close a zone: Transition to closed condition to release the drive resources used for writing to a zone * Finish a zone: Transition an open or closed zone to the full condition to prevent write operations To enable this control for in-kernel zoned block device users, define the new request operations REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH as well as the generic function blkdev_zone_mgmt() for submitting these operations on a range of zones. This results in blkdev_reset_zones() removal and replacement with this new zone magement function. Users of blkdev_reset_zones() (f2fs and dm-zoned) are updated accordingly. Contains contributions from Matias Bjorling, Hans Holmberg, Dmitry Fomichev, Keith Busch, Damien Le Moal and Christoph Hellwig. Reviewed-by:
Javier González <javier@javigon.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by:
Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by:
Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by:
Dmitry Fomichev <dmitry.fomichev@wdc.com> Signed-off-by:
Keith Busch <kbusch@kernel.org> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
There is no need for the function __blkdev_reset_all_zones() as REQ_OP_ZONE_RESET_ALL can be handled directly in blkdev_reset_zones() bio loop with an early break from the loop. This patch removes this function and modifies blkdev_reset_zones(), simplifying the code. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
REQ_OP_ZONE_RESET operations cannot be merged as these bios and requests do not have a size and are never sequential due to the zone start sector position required for their execution. As a result, there is no point in using a plug around blkdev_reset_zones() bio issuing loop. This patch removes this unnecessary plugging. Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by:
Javier González <javier@javigon.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Tejun Heo authored
blkcg_print_stat() iterates blkgs under RCU and doesn't test whether the blkg is online. This can call into pd_stat_fn() on a pd which is still being initialized leading to an oops. The heaviest operation - recursively summing up rwstat counters - is already done while holding the queue_lock. Expand queue_lock to cover the other operations and skip the blkg if it isn't online yet. The online state is protected by both blkcg and queue locks, so this guarantees that only online blkgs are processed. Signed-off-by:
Tejun Heo <tj@kernel.org> Reported-by:
Roman Gushchin <guro@fb.com> Cc: Josef Bacik <jbacik@fb.com> Fixes: 903d23f0 ("blk-cgroup: allow controllers to output their own stats") Cc: stable@vger.kernel.org # v4.19+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-