- Nov 02, 2017
-
-
Greg Kroah-Hartman authored
Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by:
Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by:
Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by:
Thomas Gleixner <tglx@linutronix.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Oct 11, 2017
-
-
Al Viro authored
Since "block: support large requests in blk_rq_map_user_iov" we started to call it with partially drained iter; that works fine on the write side, but reads create a copy of iter for completion time. And that needs to take the possibility of ->iov_iter != 0 into account... Cc: stable@vger.kernel.org #v4.5+ Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
we need to take care of failure exit as well - pages already in bio should be dropped by analogue of bio_unmap_pages(), since their refcounts had been bumped only once per reference in bio. Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Vitaly Mayatskikh authored
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if IO vector has small consecutive buffers belonging to the same page. bio_add_pc_page merges them into one, but the page reference is never dropped. Cc: stable@vger.kernel.org Signed-off-by:
Vitaly Mayatskikh <v.mayatskih@gmail.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Oct 04, 2017
-
-
Benjamin Block authored
When under memory-pressure it is possible that the mempool which backs the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count emergency buffers - in case it can't get a regular allocation. These buffers are preallocated and once they are also used, they are re-supplied with old finished requests from the same request_queue (see mempool_free()). The bug is, when re-supplying the emergency pool, the old requests are not again ran through the callback mempool_t->alloc(), and thus also not through the callback bsg_init_rq(). Thus we skip initialization, and while the sense-buffer still should be good, scsi_request->cmd might have become to be an invalid pointer in the meantime. When the request is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB, bsg will replace it with a custom allocated buffer, which is freed when the user's command is finished, thus it dangles afterwards. When next a command is sent by the user that has a smaller/similar CDB as BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by scsi_request->__cmd, will not make a custom allocation, and write into undefined memory. Fix this by splitting bsg_init_rq() into two functions: - bsg_init_rq() is changed to only do the allocation of the sense-buffer, which is used to back the bsg job's reply buffer. This pointer should never change during the lifetime of a scsi_request, so it doesn't need re-initialization. - bsg_initialize_rq() is a new function that makes use of 'struct request_queue's initialize_rq_fn callback (which was introduced in v4.12). This is always called before the request is given out via blk_get_request(). This function does the remaining initialization that was previously done in bsg_init_rq(), and will also do it when the request is taken from the emergency-pool of the backing mempool. Fixes: 50b4d485 ("bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer") Cc: <stable@vger.kernel.org> # 4.11+ Reviewed-by:
Hannes Reinecke <hare@suse.com> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Oct 03, 2017
-
-
Omar Sandoval authored
In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched directories if a default scheduler was already configured by blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do the same for the device-wide sched directory. Fix it. Fixes: d332ce09 ("blk-mq-debugfs: allow schedulers to register debugfs attributes") Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Joseph Qi authored
There is a case which will lead to io stall. The case is described as follows. /test1 |-subtest1 /test2 |-subtest2 And subtest1 and subtest2 each has 32 queued bios already. Now upgrade to max. In throtl_upgrade_state, it will try to dispatch bios as follows: 1) tg=subtest1, do nothing; 2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending left, no need to schedule next dispatch; 3) tg=subtest2, do nothing; 4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending left, no need to schedule next dispatch; 5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2 to /; note that test1 and test2 each still has 16 queued bios left; 6) tg=/, try to schedule next dispatch, but since disptime is now (update in tg_update_disptime, wait=0), pending timer is not scheduled in fact; 7) In throtl_upgrade_state it totally dispatches 32 queued bios and with 32 left. test1 and test2 each has 16 queued bios; 8) throtl_pending_timer_fn sees the left over bios, but could do nothing, because throtl_select_dispatch returns 0, and test1/test2 has no pending tg. The blktrace shows the following: 8,32 0 0 2.539007641 0 m N throtl upgrade to max 8,32 0 0 2.539072267 0 m N throtl /test2 dispatch nr_queued=16 read=0 write=16 8,32 7 0 2.539077142 0 m N throtl /test1 dispatch nr_queued=16 read=0 write=16 So force schedule dispatch if there are pending children. Reviewed-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Joseph Qi <qijiang.qj@alibaba-inc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 25, 2017
-
-
Shaohua Li authored
part_stat_show takes a part device not a disk, so we should use part_to_disk. Fixes: d62e26b3("block: pass in queue to inflight accounting") Cc: Bart Van Assche <bart.vanassche@wdc.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by:
Shaohua Li <shli@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Waiman Long authored
The lockdep code had reported the following unsafe locking scenario: CPU0 CPU1 ---- ---- lock(s_active#228); lock(&bdev->bd_mutex/1); lock(s_active#228); lock(&bdev->bd_mutex); *** DEADLOCK *** The deadlock may happen when one task (CPU1) is trying to delete a partition in a block device and another task (CPU0) is accessing tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that partition. The s_active isn't an actual lock. It is a reference count (kn->count) on the sysfs (kernfs) file. Removal of a sysfs file, however, require a wait until all the references are gone. The reference count is treated like a rwsem using lockdep instrumentation code. The fact that a thread is in the sysfs callback method or in the ioctl call means there is a reference to the opended sysfs or device file. That should prevent the underlying block structure from being removed. Instead of using bd_mutex in the block_device structure, a new blk_trace_mutex is now added to the request_queue structure to protect access to the blk_trace structure. Suggested-by:
Christoph Hellwig <hch@infradead.org> Signed-off-by:
Waiman Long <longman@redhat.com> Acked-by:
Steven Rostedt (VMware) <rostedt@goodmis.org> Fix typo in patch subject line, and prune a comment detailing how the code used to work. Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
The job structure is allocated as part of the request, so we should not free it in the error path of bsg_prepare_job. Signed-off-by:
Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 11, 2017
-
-
Jens Axboe authored
A NULL pointer crash was reported for the case of having the BFQ IO scheduler attached to the underlying blk-mq paths of a DM multipath device. The crash occured in blk_mq_sched_insert_request()'s call to e->type->ops.mq.insert_requests(). Paolo Valente correctly summarized why the crash occured with: "the call chain (dm_mq_queue_rq -> map_request -> setup_clone -> blk_rq_prep_clone) creates a cloned request without invoking e->type->ops.mq.prepare_request for the target elevator e. The cloned request is therefore not initialized for the scheduler, but it is however inserted into the scheduler by blk_mq_sched_insert_request." All said, a request-based DM multipath device's IO scheduler should be the only one used -- when the original requests are issued to the underlying paths as cloned requests they are inserted directly in the underlying dispatch queue(s) rather than through an additional elevator. But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers") switched blk_insert_cloned_request() from using blk_mq_insert_request() to blk_mq_sched_insert_request(). Which incorrectly added elevator machinery into a call chain that isn't supposed to have any. To fix this introduce a blk-mq private blk_mq_request_bypass_insert() that blk_insert_cloned_request() calls to insert the request without involving any elevator that may be attached to the cloned request's request_queue. Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers") Cc: stable@vger.kernel.org Reported-by:
Bart Van Assche <Bart.VanAssche@wdc.com> Tested-by:
Mike Snitzer <snitzer@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Mikulas Patocka authored
Fix possible integer overflow in __blkdev_sectors_to_bio_pages if sector_t is 32-bit. Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Fixes: 615d22a5 ("block: Fix __blkdev_issue_zeroout loop") Reviewed-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Scott Bauer authored
Users who are booting off their Opal enabled drives are having issues when they have a shadow MBR set up after s3/resume cycle. When the Drive has a shadow MBR setup the MBRDone flag is set to false upon power loss (S3/S4/S5). When the MBRDone flag is false I/O to LBA 0 -> LBA_END_MBR are remapped to the shadow mbr of the drive. If the drive contains useful data in the 0 -> end_mbr range upon s3 resume the user can never get to that data as the drive will keep remapping it to the MBR. To fix this when we unlock on S3 resume, we need to tell the drive that we're done with the shadow mbr (even though we didnt use it) by setting true to MBRDone. This way the drive will stop the remapping and the user can access their data. Acked-by Jon Derrick: <jonathan.derrick@intel.com> Signed-off-by:
Scott Bauer <scott.bauer@intel.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 09, 2017
-
-
Davidlohr Bueso authored
For the same reasons we already cache the leftmost pointer, apply the same optimization for rb_last() calls. Users must explicitly do this as rb_root_cached only deals with the smallest node. [dave@stgolabs.net: brain fart #1] Link: http://lkml.kernel.org/r/20170731155955.GD21328@linux-80c1.suse Link: http://lkml.kernel.org/r/20170719014603.19029-18-dave@stgolabs.net Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Cc: Jens Axboe <axboe@fb.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Davidlohr Bueso authored
... with the generic rbtree flavor instead. No changes in semantics whatsoever. Link: http://lkml.kernel.org/r/20170719014603.19029-11-dave@stgolabs.net Signed-off-by:
Davidlohr Bueso <dbueso@suse.de> Reviewed-by:
Jan Kara <jack@suse.cz> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Jens Axboe <axboe@fb.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 01, 2017
-
-
Bart Van Assche authored
Some code uses icq_to_bic() to convert an io_cq pointer to a bfq_io_cq pointer while other code uses a direct cast. Convert the code that uses a direct cast such that it uses icq_to_bic(). Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
This patch avoids that the following warnings are reported when building with W=1: block/bfq-iosched.c: In function 'bfq_back_seek_max_store': block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/bfq-iosched.c:4876:1: note: in expansion of macro 'STORE_FUNCTION' STORE_FUNCTION(bfq_back_seek_max_store, &bfqd->bfq_back_max, 0, INT_MAX, 0); ^~~~~~~~~~~~~~ block/bfq-iosched.c: In function 'bfq_slice_idle_store': block/bfq-iosched.c:4860:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/bfq-iosched.c:4879:1: note: in expansion of macro 'STORE_FUNCTION' STORE_FUNCTION(bfq_slice_idle_store, &bfqd->bfq_slice_idle, 0, INT_MAX, 2); ^~~~~~~~~~~~~~ block/bfq-iosched.c: In function 'bfq_slice_idle_us_store': block/bfq-iosched.c:4892:13: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits] if (__data < (MIN)) \ ^ block/bfq-iosched.c:4899:1: note: in expansion of macro 'USEC_STORE_FUNCTION' USEC_STORE_FUNCTION(bfq_slice_idle_us_store, &bfqd->bfq_slice_idle, 0, ^~~~~~~~~~~~~~~~~~~ Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Make sysfs writes fail for invalid numbers instead of storing uninitialized data copied from the stack. This patch removes all uninitialized_var() occurrences from the BFQ source code. Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
This patch avoids that gcc 7 issues a warning about fall-through when building with W=1. Acked-by:
Paolo Valente <paolo.valente@linaro.org> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 31, 2017
-
-
Bart Van Assche authored
This patch avoids that sparse reports the following warning messages: block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces) block/compat_ioctl.c:85:11: expected unsigned long *[noderef] <asn:1>p block/compat_ioctl.c:85:11: got void [noderef] <asn:1>* block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces) block/compat_ioctl.c:91:21: expected void const volatile [noderef] <asn:1>*<noident> block/compat_ioctl.c:91:21: got unsigned long *[noderef] <asn:1>p block/compat_ioctl.c:87:53: warning: dereference of noderef expression block/compat_ioctl.c:91:21: warning: dereference of noderef expression Fixes: commit d597580d ("generic ...copy_..._user primitives") Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Paolo Valente authored
If the function bfq_update_next_in_service is invoked as a consequence of the activation or requeueing of an entity, say E, then it doesn't invoke bfq_lookup_next_entity to get the next-in-service entity. In contrast, it follows a shorter path: if E happens to be eligible (see commit "bfq-sq-mq: make lookup_next_entity push up vtime on expirations" for details on eligibility) and to have a lower virtual finish time than the current candidate as next-in-service entity, then E directly becomes the next-in-service entity. Unfortunately, there is a corner case for which this shorter path makes bfq_update_next_in_service choose a non eligible entity: it occurs if both E and the current next-in-service entity happen to be non eligible when bfq_update_next_in_service is invoked. In this case, E is not set as next-in-service, and, since bfq_lookup_next_entity is not invoked, the state of the parent entity is not updated so as to end up with an eligible entity as the proper next-in-service entity. In this respect, next-in-service is actually allowed to be non eligible while some queue is in service: since no system-virtual-time push-up can be performed in that case (see again commit "bfq-sq-mq: make lookup_next_entity push up vtime on expirations" for details), next-in-service is chosen, speculatively, as a function of the possible value that the system virtual time may get after a push up. But the correctness of the schedule breaks if next-in-service is still a non eligible entity when it is time to set in service the next entity. Unfortunately, this may happen in the above corner case. This commit fixes this problem by making bfq_update_next_in_service invoke bfq_lookup_next_entity not only if the above shorter path cannot be taken, but also if the shorter path is taken but fails to yield an eligible next-in-service entity. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Tested-by:
Lee Tibbert <lee.tibbert@gmail.com> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
If the function bfq_update_next_in_service is invoked as a consequence of the activation or requeueing of an entity, say E, and finds out that E belongs to a higher-priority class than that of the current next-in-service entity, then it sets next_in_service directly to E. But this may lead to anomalous schedules, because E may happen not be eligible for service, because its virtual start time is higher than the system virtual time for its service tree. This commit addresses this issue by simply removing this direct switch. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Tested-by:
Lee Tibbert <lee.tibbert@gmail.com> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Paolo Valente authored
To provide a very smooth service, bfq starts to serve a bfq_queue only if the queue is 'eligible', i.e., if the same queue would have started to be served in the ideal, perfectly fair system that bfq simulates internally. This is obtained by associating each queue with a virtual start time, and by computing a special system virtual time quantity: a queue is eligible only if the system virtual time has reached the virtual start time of the queue. Finally, bfq guarantees that, when a new queue must be set in service, there is always at least one eligible entity for each active parent entity in the scheduler. To provide this guarantee, the function __bfq_lookup_next_entity pushes up, for each parent entity on which it is invoked, the system virtual time to the minimum among the virtual start times of the entities in the active tree for the parent entity (more precisely, the push up occurs if the system virtual time happens to be lower than all such virtual start times). There is however a circumstance in which __bfq_lookup_next_entity cannot push up the system virtual time for a parent entity, even if the system virtual time is lower than the virtual start times of all the child entities in the active tree. It happens if one of the child entities is in service. In fact, in such a case, there is already an eligible entity, the in-service one, even if it may not be not present in the active tree (because in-service entities may be removed from the active tree). Unfortunately, in the last re-design of the hierarchical-scheduling engine, the reset of the pointer to the in-service entity for a given parent entity--reset to be done as a consequence of the expiration of the in-service entity--always happens after the function __bfq_lookup_next_entity has been invoked. This causes the function to think that there is still an entity in service for the parent entity, and then that the system virtual time cannot be pushed up, even if actually such a no-more-in-service entity has already been properly reinserted into the active tree (or in some other tree if no more active). Yet, the system virtual time *had* to be pushed up, to be ready to correctly choose the next queue to serve. Because of the lack of this push up, bfq may wrongly set in service a queue that had been speculatively pre-computed as the possible next-in-service queue, but that would no more be the one to serve after the expiration and the reinsertion into the active trees of the previously in-service entities. This commit addresses this issue by making __bfq_lookup_next_entity properly push up the system virtual time if an expiration is occurring. Signed-off-by:
Paolo Valente <paolo.valente@linaro.org> Tested-by:
Lee Tibbert <lee.tibbert@gmail.com> Tested-by:
Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 30, 2017
-
-
Christoph Hellwig authored
The SAS code will need it. Also mark the name argument const to match bsg_register_queue. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com>
-
- Aug 29, 2017
-
-
Christoph Hellwig authored
Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ben Hutchings authored
The block core requests modules with the "-iosched" name suffix, but mq-deadline does not have that suffix. Add an alias. Fixes: 945ffb60 ("mq-deadline: add blk-mq adaptation of the deadline ...") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ben Hutchings authored
The block core requests modules with the "-iosched" name suffix, but bfq no longer has that suffix. Add an alias. Fixes: ea25da48 ("block, bfq: split bfq-iosched.c into multiple ...") Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Damien Le Moal authored
The only caller of this function is blk_start_request() in the same file. Fix blk_start_request() description accordingly. Reviewed-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Ying Huang authored
struct call_single_data is used in IPIs to transfer information between CPUs. Its size is bigger than sizeof(unsigned long) and less than cache line size. Currently it is not allocated with any explicit alignment requirements. This makes it possible for allocated call_single_data to cross two cache lines, which results in double the number of the cache lines that need to be transferred among CPUs. This can be fixed by requiring call_single_data to be aligned with the size of call_single_data. Currently the size of call_single_data is the power of 2. If we add new fields to call_single_data, we may need to add padding to make sure the size of new definition is the power of 2 as well. Fortunately, this is enforced by GCC, which will report bad sizes. To set alignment requirements of call_single_data to the size of call_single_data, a struct definition and a typedef is used. To test the effect of the patch, I used the vm-scalability multiple thread swap test case (swap-w-seq-mt). The test will create multiple threads and each thread will eat memory until all RAM and part of swap is used, so that huge number of IPIs are triggered when unmapping memory. In the test, the throughput of memory writing improves ~5% compared with misaligned call_single_data, because of faster IPIs. Suggested-by:
Peter Zijlstra <peterz@infradead.org> Signed-off-by:
Huang, Ying <ying.huang@intel.com> [ Add call_single_data_t and align with size of call_single_data. ] Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Aaron Lu <aaron.lu@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Aug 28, 2017
-
-
David Jeffery authored
There is a race between changing I/O elevator and request_queue removal which can trigger the warning in kobject_add_internal. A program can use sysfs to request a change of elevator at the same time another task is unregistering the request_queue the elevator would be attached to. The elevator's kobject will then attempt to be connected to the request_queue in the object tree when the request_queue has just been removed from sysfs. This triggers the warning in kobject_add_internal as the request_queue no longer has a sysfs directory: kobject_add_internal failed for iosched (error: -2 parent: queue) ------------[ cut here ]------------ WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0 To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when changing the elevator and use the request_queue's sysfs_lock to serialize between clearing the flag and the elevator testing the flag. Signed-off-by:
David Jeffery <djeffery@redhat.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Reviewed-by:
Ming Lei <ming.lei@redhat.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
weiping zhang authored
The last parameter "count" never be used in xxx_var_store, convert these functions to void. Signed-off-by:
weiping zhang <zhangweiping@didichuxing.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 25, 2017
-
-
weiping zhang authored
this patch fix two errors, firstly avoid kfree blk_root, secondly not free(blkcg) ,if blkcg alloc fail(blkcg == NULL), just unlock that mutex; Signed-off-by:
weiping zhang <zhangweiping@didichuxing.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Shaohua Li authored
The discard bio doesn't attach the original bio cgroup info. Normal bio is cloned, so is fine. Signed-off-by:
Shaohua Li <shli@fb.com>
-
Omar Sandoval authored
Normally I wouldn't bother with this, but in my opinion the comments are the most important part of this whole file since without them no one would have any clue how this insanity works. Signed-off-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Bart Van Assche authored
The symbolic constants QUEUE_FLAG_SCSI_PASSTHROUGH, QUEUE_FLAG_QUIESCED and REQ_NOWAIT are missing from blk-mq-debugfs.c. Add these to blk-mq-debugfs.c such that these appear as names in debugfs instead of as numbers. Reviewed-by:
Omar Sandoval <osandov@fb.com> Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Hannes Reinecke <hare@suse.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Aug 24, 2017
-
-
Bart Van Assche authored
This patch avoids that sparse reports the following warning messages: block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces) block/compat_ioctl.c:85:11: expected unsigned long *[noderef] <asn:1>p block/compat_ioctl.c:85:11: got void [noderef] <asn:1>* block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces) block/compat_ioctl.c:91:21: expected void const volatile [noderef] <asn:1>*<noident> block/compat_ioctl.c:91:21: got unsigned long *[noderef] <asn:1>p block/compat_ioctl.c:87:53: warning: dereference of noderef expression block/compat_ioctl.c:91:21: warning: dereference of noderef expression Fixes: commit d597580d ("generic ...copy_..._user primitives") Signed-off-by:
Bart Van Assche <bart.vanassche@wdc.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
weiping zhang authored
put_device(pdev) will call pdev->type->release finally, and blk_free_devt has been called in part_release(), so remove it. Signed-off-by:
weiping zhang <zhangweiping@didichuxing.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Benjamin Block authored
Since we split the scsi_request out of struct request bsg fails to provide a reply-buffer for the drivers. This was done via the pointer for sense-data, that is not preallocated anymore. Failing to allocate/assign it results in illegal dereferences because LLDs use this pointer unquestioned. An example panic on s390x, using the zFCP driver, looks like this (I had debugging on, otherwise NULL-pointer dereferences wouldn't even panic on s390x): Unable to handle kernel pointer dereference in virtual kernel address space Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403 Fault in home space mode while using kernel ASCE. AS:0000000001590007 R3:0000000000000024 Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: <Long List> CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3 Hardware name: IBM 2964 N96 702 (z/VM 6.4.0) task: 0000000065cb0100 task.stack: 0000000065cb4000 Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3 Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866 000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f 00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38 0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8 Krnl Code: 000003ff801e4146: e31020500004 lg %r1,80(%r2) 000003ff801e414c: 58402040 l %r4,64(%r2) #000003ff801e4150: e35020200004 lg %r5,32(%r2) >000003ff801e4156: 50405004 st %r4,4(%r5) 000003ff801e415a: e54c50080000 mvhi 8(%r5),0 000003ff801e4160: e33010280012 lt %r3,40(%r1) 000003ff801e4166: a718fffb lhi %r1,-5 000003ff801e416a: 1803 lr %r0,%r3 Call Trace: ([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp]) [<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp] [<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp] [<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8 [<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10 [<00000000001684c2>] tasklet_action+0x15a/0x1d8 [<0000000000bd28ec>] __do_softirq+0x3ec/0x848 [<00000000001675a4>] irq_exit+0x74/0xf8 [<000000000010dd6a>] do_IRQ+0xba/0xf0 [<0000000000bd19e8>] io_int_handler+0x104/0x2d4 [<00000000001033b6>] enabled_wait+0xb6/0x188 ([<000000000010339e>] enabled_wait+0x9e/0x188) [<000000000010396a>] arch_cpu_idle+0x32/0x50 [<0000000000bd0112>] default_idle_call+0x52/0x68 [<00000000001cd0fa>] do_idle+0x102/0x188 [<00000000001cd41e>] cpu_startup_entry+0x3e/0x48 [<0000000000118c64>] smp_start_secondary+0x11c/0x130 [<0000000000bd2016>] restart_int_handler+0x62/0x78 [<0000000000000000>] (null) INFO: lockdep is turned off. Last Breaking-Event-Address: [<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp] Kernel panic - not syncing: Fatal exception in interrupt This patch moves bsg-lib to allocate and setup struct bsg_job ahead of time, including the allocation of a buffer for the reply-data. This means, struct bsg_job is not allocated separately anymore, but as part of struct request allocation - similar to struct scsi_cmd. Reflect this in the function names that used to handle creation/destruction of struct bsg_job. Reported-by:
Steffen Maier <maier@linux.vnet.ibm.com> Suggested-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Benjamin Block <bblock@linux.vnet.ibm.com> Fixes: 82ed4db4 ("block: split scsi_request out of struct request") Cc: <stable@vger.kernel.org> #4.11+ Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Milan Broz authored
In dm-integrity target we register integrity profile that have both generate_fn and verify_fn callbacks set to NULL. This is used if dm-integrity is stacked under a dm-crypt device for authenticated encryption (integrity payload contains authentication tag and IV seed). In this case the verification is done through own crypto API processing inside dm-crypt; integrity profile is only holder of these data. (And memory is owned by dm-crypt as well.) After the commit (and previous changes) Commit 7c20f116 Author: Christoph Hellwig <hch@lst.de> Date: Mon Jul 3 16:58:43 2017 -0600 bio-integrity: stop abusing bi_end_io we get this crash: : BUG: unable to handle kernel NULL pointer dereference at (null) : IP: (null) : *pde = 00000000 ... : : Workqueue: kintegrityd bio_integrity_verify_fn : task: f48ae180 task.stack: f4b5c000 : EIP: (null) : EFLAGS: 00210286 CPU: 0 : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000 : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4 : DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0 : Call Trace: : ? bio_integrity_process+0xe3/0x1e0 : bio_integrity_verify_fn+0xea/0x150 : process_one_work+0x1c7/0x5c0 : worker_thread+0x39/0x380 : kthread+0xd6/0x110 : ? process_one_work+0x5c0/0x5c0 : ? kthread_worker_fn+0x100/0x100 : ? kthread_worker_fn+0x100/0x100 : ret_from_fork+0x19/0x24 : Code: Bad EIP value. : EIP: (null) SS:ESP: 0068:f4b5dea4 : CR2: 0000000000000000 Patch just skip the whole verify workqueue if verify_fn is set to NULL. Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io") Signed-off-by:
Milan Broz <gmazyland@gmail.com> [hch: trivial whitespace fix] Signed-off-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-