summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | block: enumify ELEVATOR_*_MERGEChristoph Hellwig2017-02-089-89/+86Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch these constants to an enum, and make let the compiler ensure that all callers of blk_try_merge and elv_merge handle all potential values. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: move req_set_nomerge to blk.hChristoph Hellwig2017-02-082-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes it available outside of blk-merge.c, and inlining such a trivial helper seems pretty useful to start with. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-sched: (un)register elevator when (un)registering queueOmar Sandoval2017-02-061-11/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I noticed that when booting with a default blk-mq I/O scheduler, the /sys/block/*/queue/iosched directory was missing. However, switching after boot did create the directory. This is because we skip the initial elevator register/unregister when we don't have a ->request_fn(), but we should still do it for the ->mq_ops case. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: free merged request in the callerJens Axboe2017-02-035-12/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we end up doing a request-to-request merge when we have completed a bio-to-request merge, we free the request from deep down in that path. For blk-mq-sched, the merge path has to hold the appropriate lock, but we don't need it for freeing the request. And in fact holding the lock is problematic, since we are now calling the mq sched put_rq_private() hook with the lock held. Other call paths do not hold this lock. Fix this inconsistency by ensuring that the caller frees a merged request. Then we can do it outside of the lock, making it both more efficient and fixing the blk-mq-sched problem of invoking parts of the scheduler with an unknown lock state. Reported-by: Paolo Valente <paolo.valente@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
| | * | blk-merge: return the merged requestJens Axboe2017-02-032-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we attempt to merge request-to-request, we return a 0/1 if we ended up merging or not. Change that to return the pointer to the request that we freed. We will use this to move the freeing of that request out of the merge logic, so that callers can drop locks before freeing the request. There should be no functional changes in this patch. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com>
| | * | blkcg: fix double free of new_blkg in blkcg_init_queueHou Tao2017-02-031-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If blkg_create fails, new_blkg passed as an argument will be freed by blkg_create, so there is no need to free it again. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-sched: bypass the scheduler for flushes entirelyOmar Sandoval2017-02-033-6/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a weird inconsistency that flushes are mostly hidden from the scheduler, but it needs to be aware of them in ->insert_requests(). Instead of having every scheduler call blk_mq_sched_bypass_insert(), let's do it in the common framework. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq: move debugfs_remove() of disk dir to blk_release_queue()Omar Sandoval2017-02-022-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This needs to happen after we tear down blktrace. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: use same block debugfs directory for blk-mq and blktraceOmar Sandoval2017-02-025-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I added the blk-mq debugging information to debugfs, I didn't notice that blktrace also creates a "block" directory in debugfs. Make them use the same dentry, now created in the core block code. Based on a patch from Jens. Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | scsi, block: fix duplicate bdi name registration crashesDan Williams2017-02-022-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Warnings of the following form occur because scsi reuses a devt number while the block layer still has it referenced as the name of the bdi [1]: WARNING: CPU: 1 PID: 93 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x62/0x80 sysfs: cannot create duplicate filename '/devices/virtual/bdi/8:192' [..] Call Trace: dump_stack+0x86/0xc3 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5f/0x80 ? kernfs_path_from_node+0x4f/0x60 sysfs_warn_dup+0x62/0x80 sysfs_create_dir_ns+0x77/0x90 kobject_add_internal+0xb2/0x350 kobject_add+0x75/0xd0 device_add+0x15a/0x650 device_create_groups_vargs+0xe0/0xf0 device_create_vargs+0x1c/0x20 bdi_register+0x90/0x240 ? lockdep_init_map+0x57/0x200 bdi_register_owner+0x36/0x60 device_add_disk+0x1bb/0x4e0 ? __pm_runtime_use_autosuspend+0x5c/0x70 sd_probe_async+0x10d/0x1c0 async_run_entry_fn+0x39/0x170 This is a brute-force fix to pass the devt release information from sd_probe() to the locations where we register the bdi, device_add_disk(), and unregister the bdi, blk_cleanup_queue(). Thanks to Omar for the quick reproducer script [2]. This patch survives where an unmodified kernel fails in a few seconds. [1]: https://marc.info/?l=linux-scsi&m=147116857810716&w=4 [2]: http://marc.info/?l=linux-block&m=148554717109098&w=2 Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Jan Kara <jack@suse.cz> Reported-by: Omar Sandoval <osandov@osandov.com> Tested-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: Get rid of blk_get_backing_dev_info()Jan Kara2017-02-023-24/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_get_backing_dev_info() is now a simple dereference. Remove that function and simplify some code around that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: Make blk_get_backing_dev_info() safe without open bdevJan Kara2017-02-021-5/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currenly blk_get_backing_dev_info() is not safe to be called when the block device is not open as bdev->bd_disk is NULL in that case. However inode_to_bdi() uses this function and may be call called from flusher worker or other writeback related functions without bdev being open which leads to crashes such as: [113031.075540] Unable to handle kernel paging request for data at address 0x00000000 [113031.075614] Faulting instruction address: 0xc0000000003692e0 0:mon> t [c0000000fb65f900] c00000000036cb6c writeback_sb_inodes+0x30c/0x590 [c0000000fb65fa10] c00000000036ced4 __writeback_inodes_wb+0xe4/0x150 [c0000000fb65fa70] c00000000036d33c wb_writeback+0x30c/0x450 [c0000000fb65fb40] c00000000036e198 wb_workfn+0x268/0x580 [c0000000fb65fc50] c0000000000f3470 process_one_work+0x1e0/0x590 [c0000000fb65fce0] c0000000000f38c8 worker_thread+0xa8/0x660 [c0000000fb65fd80] c0000000000fc4b0 kthread+0x110/0x130 [c0000000fb65fe30] c0000000000098f0 ret_from_kernel_thread+0x5c/0x6c Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: Dynamically allocate and refcount backing_dev_infoJan Kara2017-02-022-8/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of storing backing_dev_info inside struct request_queue, allocate it dynamically, reference count it, and free it when the last reference is dropped. Currently only request_queue holds the reference but in the following patch we add other users referencing backing_dev_info. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: Use pointer to backing_dev_info from request_queueJan Kara2017-02-027-28/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will want to have struct backing_dev_info allocated separately from struct request_queue. As the first step add pointer to backing_dev_info to request_queue and convert all users touching it. No functional changes in this patch. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: Unhash block device inodes on gendisk destructionJan Kara2017-02-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, block device inodes stay around after corresponding gendisk hash died until memory reclaim finds them and frees them. Since we will make block device inode pin the bdi, we want to free the block device inode as soon as the device goes away so that bdi does not stay around unnecessarily. Furthermore we need to avoid issues when new device with the same major,minor pair gets created since reusing the bdi structure would be rather difficult in this case. Unhashing block device inode on gendisk destruction nicely deals with these problems. Once last block device inode reference is dropped (which may be directly in del_gendisk()), the inode gets evicted. Furthermore if the major,minor pair gets reallocated, we are guaranteed to get new block device inode even if old block device inode is not yet evicted and thus we avoid issues with possible reuse of bdi. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-debug: Introduce debugfs_create_files()Bart Van Assche2017-02-011-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the two debugfs_create_file() loops by a call to the new debugfs_create_files() function. Add an empty element at the end of the two attribute arrays such that the array size does not have to be passed to debugfs_create_files(). Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-debug: Make show() operations interruptibleBart Van Assche2017-02-011-8/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow users to interrupt show operations instead of making a user space process unkillable if ownership of q->sysfs_lock cannot be obtained. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-debug: Avoid that sparse complains about req_flags_t usageBart Van Assche2017-02-012-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that sparse reports the following complaints: block/elevator.c:541:29: warning: incorrect type in assignment (different base types) block/elevator.c:541:29: expected bool [unsigned] [usertype] next_sorted block/elevator.c:541:29: got restricted req_flags_t block/blk-mq-debugfs.c:92:54: warning: cast from restricted req_flags_t Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-debugfs: Add missing __acquires() / __releases() annotationsBart Van Assche2017-02-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids that sparse complains about lock imbalances. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: fold cmd_type into the REQ_OP_ spaceChristoph Hellwig2017-01-317-36/+24Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of keeping two levels of indirection for requests types, fold it all into the operations. The little caveat here is that previously cmd_type only applied to struct request, while the request and bio op fields were set to plain REQ_OP_READ/WRITE even for passthrough operations. Instead this patch adds new REQ_OP_* for SCSI passthrough and driver private requests, althought it has to add two for each so that we can communicate the data in/out nature of the request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: introduce blk_rq_is_passthroughChristoph Hellwig2017-01-315-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This can be used to check for fs vs non-fs requests and basically removes all knowledge of BLOCK_PC specific from the block layer, as well as preparing for removing the cmd_type field in struct request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: make scsi_request and scsi ioctl support optionalChristoph Hellwig2017-01-312-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need this code to support scsi, ide, cciss and virtio. And at least for virtio it's a deprecated feature to start with. This should shrink the kernel size for embedded device that only use, say eMMC a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: don't assign cmd_flags in __blk_rq_prep_cloneChristoph Hellwig2017-01-271-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These days we have the proper flags set since request allocation time. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: split scsi_request out of struct requestChristoph Hellwig2017-01-276-128/+87Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And require all drivers that want to support BLOCK_PC to allocate it as the first thing of their private data. To support this the legacy IDE and BSG code is switched to set cmd_size on their queues to let the block layer allocate the additional space. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block/bsg: move queue creation into bsg_setup_queueChristoph Hellwig2017-01-271-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simply the boilerplate code needed for bsg nodes a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: allow specifying size for extra command dataChristoph Hellwig2017-01-273-17/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mirrors the blk-mq capabilities to allocate extra drivers-specific data behind struct request by setting a cmd_size field, as well as having a constructor / destructor for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: simplify blk_init_allocated_queueChristoph Hellwig2017-01-271-23/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return an errno value instead of the passed in queue so that the callers don't have to keep track of two queues, and move the assignment of the request_fn and lock to the caller as passing them as argument doesn't simplify anything. While we're at it also remove two pointless NULL assignments, given that the request structure is zeroed on allocation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: fix elevator init checkChristoph Hellwig2017-01-271-22/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can't initalize the elevator fields for flushes as flush share space in struct request with the elevator data. But currently we can't communicate that a request is a flush through blk_get_request as we can only pass READ or WRITE, and the low-level code looks at the possible NULL bio to check for a flush. Fix this by allowing to pass any block op and flags, and by checking for the flush flags in __get_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | Merge branch 'for-4.11/block' into for-4.11/rq-refactorJens Axboe2017-01-2726-634/+2827
| | |\ \ | | | | | | | | | | | | | | | Signed-off-by: Jens Axboe <axboe@fb.com>
| * | \ \ Merge branch 'for-4.11/block' into for-4.11/linus-mergeJens Axboe2017-02-1729-641/+5782
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | block/sed-opal: allocate struct opal_dev dynamicallyChristoph Hellwig2017-02-172-10/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Insted of bloating the containing structure with it all the time this allocates struct opal_dev dynamically. Additionally this allows moving the definition of struct opal_dev into sed-opal.c. For this a new private data field is added to it that is passed to the send/receive callback. After that a lot of internals can be made private as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | block/sed-opal: tone down not supported warningsChristoph Hellwig2017-02-171-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not having OPAL or a sub-feature supported is an entirely normal condition for many drives, so don't warn about it. Keep the messages, but tone them down to debug only. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASANScott Bauer2017-02-151-87/+46Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_KASAN is enabled, compilation fails: block/sed-opal.c: In function 'sed_ioctl': block/sed-opal.c:2447:1: error: the frame size of 2256 bytes is larger than 2048 bytes [-Werror=frame-larger-than=] Moved all the ioctl structures off the stack and dynamically allocate using _IOC_SIZE() Fixes: 455a7b238cd6 ("block: Add Sed-opal library") Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | elevator: fix loading wrong elevator type for blk-mq devicesJens Axboe2017-02-141-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old elevator= boot parameter blindly attempts to load the same scheduler for mq and !mq devices, leading to a crash if we specify the wrong one. Ensure that we only apply this boot parameter to old !mq devices. Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | block: Add Sed-opal libraryScott Bauer2017-02-064-0/+2885
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the necessary logic to bring an Opal enabled drive out of a factory-enabled into a working Opal state. This patch set also enables logic to save a password to be replayed during a resume from suspend. Signed-off-by: Scott Bauer <scott.bauer@intel.com> Signed-off-by: Rafael Antognolli <Rafael.Antognolli@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | block: queue lock must be acquired when iterating over rlsTahsin Erdogan2017-02-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk_set_queue_dying() does not acquire queue lock before it calls blk_queue_for_each_rl(). This allows a racing blkg_destroy() to remove blkg->q_node from the linked list and have blk_queue_for_each_rl() loop infitely over the removed blkg->q_node list node. Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | block: Update comments that refer to __bio_map_user() and bio_map_user()Bart Van Assche2017-02-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since __bio_map_user() and bio_map_user() have been removed, update the comments that still refer to these functions. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> References: commit ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user") Cc: Ming Lei <tom.leiming@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: don't fail allocating driver tag for stopped hw queueJens Axboe2017-02-011-3/+0Star
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We rely on blk_mq_get_driver_tag() not failing if 'wait' is true, but it currently fails in that case if the queue happens to be stopped at the time of the call. We don't need to check for stopped here, it's just assigning the tag. If the queue is stopped, we'll handle it when attempting to run the queue. This fixes a stall/crash on flush intensive workloads, where we proceed to process a flush that doesn't have a valid tag assigned. Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq: fix debugfs compilation issuesOmar Sandoval2017-01-273-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a couple of problems: 1. In the !CONFIG_DEBUG_FS case, the stub definitions were bogus. 2. In the !CONFIG_BLOCK case, blk-mq-debugfs.c shouldn't be compiled at all. Fix the stub definitions and add a CONFIG_BLK_DEBUG_FS Kconfig option. Fixes: 07e4fead45e6 ("blk-mq: create debugfs directory tree") Signed-off-by: Omar Sandoval <osandov@fb.com> Augment Kconfig description. Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: cleanup remaining manual checks for PREFLUSH|FUAJens Axboe2017-01-272-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use op_is_flush() where applicable. Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-sched: add flush insertion into blk_mq_sched_insert_request()Jens Axboe2017-01-278-55/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of letting the caller check this and handle the details of inserting a flush request, put the logic in the scheduler insertion function. This fixes direct flush insertion outside of the usual make_request_fn calls, like from dm via blk_insert_cloned_request(). Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | block: add a op_is_flush helperChristoph Hellwig2017-01-273-9/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This centralizes the checks for bios that needs to be go into the flush state machine. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq-sched: change ->dispatch_requests() to ->dispatch_request()Jens Axboe2017-01-273-13/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we invoke dispatch_requests(), the scheduler empties everything into the passed in list. This isn't always a good thing, since it means that we remove items that we could have potentially merged with. Change the function to dispatch single requests at the time. If we do that, we can backoff exactly at the point where the device can't consume more IO, and leave the rest with the scheduler for better merging and future dispatch decision making. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
| | * | blk-mq-sched: fix starvation for multiple hardware queues and shared tagsJens Axboe2017-01-274-7/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have both multiple hardware queues and shared tag map between devices, we need to ensure that we propagate the hardware queue restart bit higher up. This is because we can get into a situation where we don't have any IO pending on a hardware queue, yet we fail getting a tag to start new IO. If that happens, it's not enough to mark the hardware queue as needing a restart, we need to bubble that up to the higher level queue as well. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
| | * | blk-mq: release driver tag on a requeue eventJens Axboe2017-01-271-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't want to hold on to this resource when we have a scheduler attached. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
| | * | blk-mq: fix potential race in queue restart and driver tag allocationJens Axboe2017-01-271-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we mark the queue as needing a restart, re-check if we can get a driver tag. This fixes a theoretical issue where the needed IO completes _after_ blk_mq_get_driver_tag() fails, but before we manage to set the restart bit. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
| | * | blk-mq: improve scheduler queue sync/async runningJens Axboe2017-01-271-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll use the same criteria for whether we need to run the queue sync or async when we have a scheduler, as we do without one. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Tested-by: Hannes Reinecke <hare@suse.com>
| | * | blk-mq: move hctx and ctx counters from sysfs to debugfsOmar Sandoval2017-01-272-64/+181
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These counters aren't as out-of-place in sysfs as the other stuff, but debugfs is a slightly better home for them. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq: move hctx io_poll, stats, and dispatched from sysfs to debugfsOmar Sandoval2017-01-272-92/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These statistics _might_ be useful to userspace, but it's better not to commit to an ABI for these yet. Also, the dispatched file in sysfs couldn't be cleared, so make it clearable like the others in debugfs. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | blk-mq: add tags and sched_tags bitmaps to debugfsOmar Sandoval2017-01-271-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These can be used to debug issues like tag leaks and stuck requests. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>