summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host/core.c
Commit message (Collapse)AuthorAgeFilesLines
* nvme: fix possible io failures when removing multipathed nsAnton Eidelman2019-07-261-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 2181e455612a8db2761eabbf126640552a451e96 ] When a shared namespace is removed, we call blk_cleanup_queue() when the device can still be accessed as the current path and this can result in submission to a dying queue. Hence, direct_make_request() called by our mpath device may fail (propagating the failure to userspace). Instead, we want to failover this I/O to a different path if one exists. Thus, before we cleanup the request queue, we make sure that the device is cleared from the current path nor it can be selected again as such. Fix this by: - clear the ns from the head->list and synchronize rcu to make sure there is no concurrent path search that restores it as the current path - clear the mpath current path in order to trigger a subsequent path search and sync srcu to wait for any ongoing request submissions - safely continue to namespace removal and blk_cleanup_queue Signed-off-by: Anton Eidelman <anton@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: Fix u32 overflow in the number of namespace list calculationJaesoo Lee2019-06-251-1/+2
| | | | | | | | | | | | | | [ Upstream commit c8e8c77b3bdbade6e26e8e76595f141ede12b692 ] The Number of Namespaces (nn) field in the identify controller data structure is defined as u32 and the maximum allowed value in NVMe specification is 0xFFFFFFFEUL. This change fixes the possible overflow of the DIV_ROUND_UP() operation used in nvme_scan_ns_list() by casting the nn to u64. Signed-off-by: Jaesoo Lee <jalee@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: fix memory leak for power latency toleranceYufen Yu2019-06-191-0/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 510a405d945bc985abc513fafe45890cac34fafa ] Unconditionally hide device pm latency tolerance when uninitializing the controller to ensure all qos resources are released so that we're not leaking this memory. This is safe to call if none were allocated in the first place, or were previously freed. Fixes: c5552fde102fc("nvme: Enable autonomous power state transitions") Suggested-by: Keith Busch <keith.busch@intel.com> Tested-by: David Milburn <dmilburn@redhat.com> Signed-off-by: Yufen Yu <yuyufen@huawei.com> [changelog] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: release namespace SRCU protection before performing controller ioctlsChristoph Hellwig2019-06-191-5/+20
| | | | | | | | | | | | | | | [ Upstream commit 5fb4aac756acacf260b9ebd88747251effa3a2f2 ] Holding the SRCU critical section protecting the namespace list can cause deadlocks when using the per-namespace admin passthrough ioctl to delete as namespace. Release it earlier when performing per-controller ioctls to avoid that. Reported-by: Kenneth Heitke <kenneth.heitke@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: merge nvme_ns_ioctl into nvme_ioctlChristoph Hellwig2019-06-191-23/+24
| | | | | | | | | | | [ Upstream commit 90ec611adcf20b96d0c2b7166497d53e4301a57f ] Merge the two functions to make future changes a little easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: remove the ifdef around nvme_nvm_ioctlChristoph Hellwig2019-06-191-2/+0Star
| | | | | | | | | | | | [ Upstream commit 3f98bcc58cd5f1e4668db289dcab771874cc0920 ] We already have a proper stub if lightnvm is not enabled, so don't bother with the ifdef. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: fix srcu locking on error return in nvme_get_ns_from_diskChristoph Hellwig2019-06-191-4/+9
| | | | | | | | | | | | | [ Upstream commit 100c815cbd56480b3e31518475b04719c363614a ] If we can't get a namespace don't leak the SRCU lock. nvme_ioctl was working around this, but nvme_pr_command wasn't handling this properly. Just do what callers would usually expect. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: set 0 capacity if namespace block size exceeds PAGE_SIZESagi Grimberg2019-05-311-1/+6
| | | | | | | | | | | | | | [ Upstream commit 01fa017484ad98fccdeaab32db0077c574b6bd6f ] If our target exposed a namespace with a block size that is greater than PAGE_SIZE, set 0 capacity on the namespace as we do not support it. This issue encountered when the nvmet namespace was backed by a tempfile. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: lock NS list changes while handling command effectsKeith Busch2019-03-131-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit e7ad43c3eda6a1690c4c3c341f95dc1c6898da83 ] If a controller supports the NS Change Notification, the namespace scan_work is automatically triggered after attaching a new namespace. Occasionally the namespace scan_work may append the new namespace to the list before the admin command effects handling is completed. The effects handling unfreezes namespaces, but if it unfreezes the newly attached namespace, its request_queue freeze depth will be off and we'll hit the warning in blk_mq_unfreeze_queue(). On the next namespace add, we will fail to freeze that queue due to the previous bad accounting and deadlock waiting for frozen. Fix that by preventing scan work from altering the namespace list while command effects handling needs to pair freeze with unfreeze. Reported-by: Wen Xiong <wenxiong@us.ibm.com> Tested-by: Wen Xiong <wenxiong@us.ibm.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: pad fake subsys NQN vid and ssvid with zerosKeith Busch2019-02-201-1/+1
| | | | | | | | | | | [ Upstream commit 3da584f57133e51aeb84aaefae5e3d69531a1e4f ] We need to preserve the leading zeros in the vid and ssvid when generating a unique NQN. Truncating these may lead to naming collisions. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: validate controller state before rescheduling keep aliveJames Smart2018-12-211-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 86880d646122240596d6719b642fee3213239994 ] Delete operations are seeing NULL pointer references in call_timer_fn. Tracking these back, the timer appears to be the keep alive timer. nvme_keep_alive_work() which is tied to the timer that is cancelled by nvme_stop_keep_alive(), simply starts the keep alive io but doesn't wait for it's completion. So nvme_stop_keep_alive() only stops a timer when it's pending. When a keep alive is in flight, there is no timer running and the nvme_stop_keep_alive() will have no affect on the keep alive io. Thus, if the io completes successfully, the keep alive timer will be rescheduled. In the failure case, delete is called, the controller state is changed, the nvme_stop_keep_alive() is called while the io is outstanding, and the delete path continues on. The keep alive happens to successfully complete before the delete paths mark it as aborted as part of the queue termination, so the timer is restarted. The delete paths then tear down the controller, and later on the timer code fires and the timer entry is now corrupt. Fix by validating the controller state before rescheduling the keep alive. Testing with the fix has confirmed the condition above was hit. Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: flush namespace scanning work just before removing namespacesSagi Grimberg2018-12-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | [ Upstream commit f6c8e432cb0479255322c5d0335b9f1699a0270c ] nvme_stop_ctrl can be called also for reset flow and there is no need to flush the scan_work as namespaces are not being removed. This can cause deadlock in rdma, fc and loop drivers since nvme_stop_ctrl barriers before controller teardown (and specifically I/O cancellation of the scan_work itself) takes place, but the scan_work will be blocked anyways so there is no need to flush it. Instead, move scan_work flush to nvme_remove_namespaces() where it really needs to flush. Reported-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed by: James Smart <jsmart2021@gmail.com> Tested-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: make sure ns head inherits underlying device limitsSagi Grimberg2018-11-271-1/+3
| | | | | | | | | | | | | | | | [ Upstream commit 8f676b8508c250bbe255096522fdefb73f1ea0b9 ] Whenever we update ns_head info, we need to make sure it is still compatible with all underlying backing devices because although nvme multipath doesn't have any explicit use of these limits, other devices can still be stacked on top of it which may rely on the underlying limits. Start with unlimited stacking limits, and every info update iterate over siblings and adjust queue limits. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
* nvme: remove ns sibling before clearing pathKeith Busch2018-10-081-1/+1
| | | | | | | | | | | | | | | The code had been clearing a namespace being deleted as the current path while that namespace was still in the path siblings list. It is possible a new IO could set that namespace back to the current path since it appeared to be an eligable path to select, which may result in a use-after-free error. This patch ensures a namespace being removed is not eligable to be reset as a current path prior to clearing it as the current path. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: set gendisk read only based on nsattrChaitanya Kulkarni2018-08-081-0/+6
| | | | | | | | | | NVMe 1.3 TP 4005 introduces new filed (NSATTR). This field indicates whether given namespace is write protected or not. This patch sets the gendisk associated with the namespace to read only based on the identify namespace nsattr field. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block2Jens Axboe2018-08-061-26/+43
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes from Christoph: "This contains the support for TP4004, Asymmetric Namespace Access, which makes NVMe multipathing usable in practice." * 'nvme-4.19' of git://git.infradead.org/nvme: nvmet: use Retain Async Event bit to clear AEN nvmet: support configuring ANA groups nvmet: add minimal ANA support nvmet: track and limit the number of namespaces per subsystem nvmet: keep a port pointer in nvmet_ctrl nvme: add ANA support nvme: remove nvme_req_needs_failover nvme: simplify the API for getting log pages nvme.h: add ANA definitions nvme.h: add support for the log specific field Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: add ANA supportChristoph Hellwig2018-07-271-8/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for Asynchronous Namespace Access as specified in NVMe 1.3 TP 4004. With ANA each namespace attached to a controller belongs to an ANA group that describes the characteristics of accessing the namespaces through this controller. In the optimized and non-optimized states namespaces can be accessed regularly, although in a multi-pathing environment we should always prefer to access a namespace through a controller where an optimized relationship exists. Namespaces in Inaccessible, Permanent-Loss or Change state for a given controller should not be accessed. The states are updated through reading the ANA log page, which is read once during controller initialization, whenever the ANA change notice AEN is received, or when one of the ANA specific status codes that signal a state change is received on a command. The ANA state is kept in the nvme_ns structure, which makes the checks in the fast path very simple. Updating the ANA state when reading the log page is also very simple, the only downside is that finding the initial ANA state when scanning for namespaces is a bit cumbersome. The gendisk for a ns_head is only registered once a live path for it exists. Without that the kernel would hang during partition scanning. Includes fixes and improvements from Hannes Reinecke. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * nvme: remove nvme_req_needs_failoverChristoph Hellwig2018-07-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Now that we just call out to blk_path_error there isn't really any good reason to not merge it into the only caller. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * nvme: simplify the API for getting log pagesChristoph Hellwig2018-07-271-21/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Merge nvme_get_log and nvme_get_log_ext into a single helper, which takes a plain nsid instead of the nvme_ns pointer. Also add support for the log specific field while we're at it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
* | Merge tag 'v4.18-rc6' into for-4.19/block2Jens Axboe2018-08-061-29/+34
|\ \ | | | | | | | | | | | | | | | | | | Pull in 4.18-rc6 to get the NVMe core AEN change to avoid a merge conflict down the line. Signed-of-by: Jens Axboe <axboe@kernel.dk>
| * | nvme: fix handling of metadata_len for NVME_IOCTL_IO_CMDRoland Dreier2018-07-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The old code in nvme_user_cmd() passed the userspace virtual address from nvme_passthru_cmd.metadata as the length of the metadata buffer as well as the address to nvme_submit_user_cmd(). Fixes: 63263d60 ("nvme: Use metadata for passthrough commands") Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: don't enable AEN if not supportedWeiping Zhang2018-07-171-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid excuting set_feature command if there is no supported bit in Optional Asynchronous Events Supported (OAES). Fixes: c0561f82 ("nvme: submit AEN event configuration on startup") Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Weiping Zhang <zhangweiping@didichuxing.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: ensure forward progress during Admin passthruScott Bauer2018-07-171-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the controller supports effects and goes down during the passthru admin command we will deadlock during namespace revalidation. [ 363.488275] INFO: task kworker/u16:5:231 blocked for more than 120 seconds. [ 363.488290] Not tainted 4.17.0+ #2 [ 363.488296] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.488303] kworker/u16:5 D 0 231 2 0x80000000 [ 363.488331] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 363.488338] Call Trace: [ 363.488385] schedule+0x75/0x190 [ 363.488396] rwsem_down_read_failed+0x1c3/0x2f0 [ 363.488481] call_rwsem_down_read_failed+0x14/0x30 [ 363.488504] down_read+0x1d/0x80 [ 363.488523] nvme_stop_queues+0x1e/0xa0 [nvme_core] [ 363.488536] nvme_dev_disable+0xae4/0x1620 [nvme] [ 363.488614] nvme_reset_work+0xd1e/0x49d9 [nvme] [ 363.488911] process_one_work+0x81a/0x1400 [ 363.488934] worker_thread+0x87/0xe80 [ 363.488955] kthread+0x2db/0x390 [ 363.488977] ret_from_fork+0x35/0x40 Fixes: 84fef62d135b6 ("nvme: check admin passthru command effects") Signed-off-by: Scott Bauer <scott.bauer@intel.com> Reviewed-by: Keith Busch <keith.busch@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | nvme: use blk API to remap ref tags for IOs with metadataMax Gurtovoy2018-07-301-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also moved the logic of the remapping to the nvme core driver instead of implementing it in the nvme pci driver. This way all the other nvme transport drivers will benefit from it (in case they'll implement metadata support). Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | block: move ref_tag calculation func to the block layerMax Gurtovoy2018-07-301-2/+1Star
| |/ |/| | | | | | | | | | | | | | | | | | | Currently this function is implemented in the scsi layer, but it's actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Suggested-by: Christoph Hellwig <hch@lst.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | nvme: use hw qid in trace eventsKeith Busch2018-07-231-4/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can not match a command to its completion based on the command id alone. We need the submitting queue identifier to pair with the completion, so this patch adds that to the trace buffer. This patch is also collapsing the admin and IO submission traces into a single one so we don't need to duplicate this and creating unnecessary code branches: we know if the command is an admin vs IO based on the qid. And since we're here, the patch fixes code formatting in the area. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> [hch: move the qid helper to nvme.h and made it an inline function] Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme: move init of keep_alive work item to controller initializationJames Smart2018-07-231-3/+4
|/ | | | | | | | | | | | | | | Currently, the code initializes the keep alive work item whenever nvme_start_keep_alive() is called. However, this routine is called several times while reconnecting, etc. Although it's hoped that keep alive is always disabled and not scheduled when start is called, re-initing if it were scheduled or completing can have very bad side effects. There's no need for re-initialization. Move the keep_alive work item and cmd struct initialization to controller init. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-pci: limit max IO size and segments to avoid high order allocationsJens Axboe2018-06-211-0/+1
| | | | | | | | | | | | | | | | | | | nvme requires an sg table allocation for each request. If the request is large, then the allocation can become quite large. For instance, with our default software settings of 1280KB IO size, we'll need 10248 bytes of sg table. That turns into a 2nd order allocation, which we can't always guarantee. If we fail the allocation, blk-mq will retry it later. But there's no guarantee that we'll EVER be able to allocate that much contigious memory. Limit the IO size such that we never need more than a single page of memory. That's a lot faster and more reliable. Then back that allocation with a mempool, so that we know we'll always be able to succeed the allocation at some point. Signed-off-by: Jens Axboe <axboe@kernel.dk> Acked-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-linusJens Axboe2018-06-151-36/+12Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe fixes from Christoph: "Fix various little regressions introduced in this merge window, plus a rework of the fibre channel connect and reconnect path to share the code instead of having separate sets of bugs. Last but not least a trivial trace point addition from Hannes." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme-fabrics: fix and refine state checks in __nvmf_check_ready nvme-fabrics: handle the admin-only case properly in nvmf_check_ready nvme-fabrics: refactor queue ready check blk-mq: remove blk_mq_tagset_iter nvme: remove nvme_reinit_tagset nvme-fc: fix nulling of queue data on reconnect nvme-fc: remove reinit_request routine nvme-fc: change controllers first connect to use reconnect path nvme: don't rely on the changed namespace list log nvmet: free smart-log buffer after use nvme-rdma: fix error flow during mapping request data nvme: add bio remapping tracepoint nvme: fix NULL pointer dereference in nvme_init_subsystem
| * nvme: remove nvme_reinit_tagsetChristoph Hellwig2018-06-141-10/+0Star
| | | | | | | | | | | | | | Unused now that all transports stopped using it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jens Axboe <axboe@kernel.dk>
| * nvme: don't rely on the changed namespace list logChristoph Hellwig2018-06-131-25/+11Star
| | | | | | | | | | | | | | | | | | | | | | Don't optimize our namespace rescan based on the changed namespace list log page as userspace might have changed the content through reading it. Suggested-by: Keith Busch <keith.busch@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@linux.intel.com> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * nvme: fix NULL pointer dereference in nvme_init_subsystemIsrael Rukshin2018-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | When using nvme-pci driver the nvmf_ctrl_options is NULL. There is no need to check for discovery_nqn flag at non-fabrics controller. Fixes: 181303d0 ("nvme-fabrics: allow duplicate connections to the discovery controller") Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | Merge tag 'for-linus-20180608' of git://git.kernel.dk/linux-blockLinus Torvalds2018-06-081-2/+2
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A few fixes for this merge window, where some of them should go in sooner rather than later, hence a new pull this week. This pull request contains: - Set of NVMe fixes, mostly follow up cleanups/fixes to the queue changes, but also teardown/removal and misc changes (Christop/Dan/ Johannes/Sagi/Steve). - Two lightnvm fixes for issues that showed up in this window (Colin/Wei). - Failfast/driver flags inheritance for flush requests (Hannes). - The md device put sanitization and fix (Kent). - dm bio_set inheritance fix (me). - nbd discard granularity fix (Josef). - nbd consistency in command printing (Kevin). - Loop recursion validation fix (Ted). - Partition overlap check (Wang)" [ .. and now my build is warning-free again thanks to the md fix - Linus ] * tag 'for-linus-20180608' of git://git.kernel.dk/linux-block: (22 commits) nvme: cleanup double shift issue nvme-pci: make CMB SQ mod-param read-only nvme-pci: unquiesce dead controller queues nvme-pci: remove HMB teardown on reset nvme-pci: queue creation fixes nvme-pci: remove unnecessary completion doorbell check nvme-pci: remove unnecessary nested locking nvmet: filter newlines from user input nvme-rdma: correctly check for target keyed sgl support nvme: don't hold nvmf_transports_rwsem for more than transport lookups nvmet: return all zeroed buffer when we can't find an active namespace md: Unify mddev destruction paths dm: use bioset_init_from_src() to copy bio_set block: add bioset_init_from_src() helper block: always set partition number to '0' in blk_partition_remap() block: pass failfast and driver-specific flags to flush requests nbd: set discard_alignment to the granularity nbd: Consistently use request pointer in debug messages. block: add verifier for cmdline partition lightnvm: pblk: fix resource leak of invalid_bitmap ...
| * nvme: cleanup double shift issueDan Carpenter2018-06-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem here is that set_bit() and test_bit() take a bit number so we should be passing 0 but instead we're passing (1 << 0) which leads to a double shift. It doesn't cause a runtime bug in the current code because it's done consistently and we only set that one bit. I decided to just re-use NVME_AER_NOTICE_NS_CHANGED instead of introducing a new define for this. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds2018-06-051-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: - updates to the handling of expedited grace periods - updates to reduce lock contention in the rcu_node combining tree [ These are in preparation for the consolidation of RCU-bh, RCU-preempt, and RCU-sched into a single flavor, which was requested by Linus in response to a security flaw whose root cause included confusion between the multiple flavors of RCU ] - torture-test updates that save their users some time and effort - miscellaneous fixes * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (44 commits) rcu/x86: Provide early rcu_cpu_starting() callback torture: Make kvm-find-errors.sh find build warnings rcutorture: Abbreviate kvm.sh summary lines rcutorture: Print end-of-test state in kvm.sh summary rcutorture: Print end-of-test state torture: Fold parse-torture.sh into parse-console.sh torture: Add a script to edit output from failed runs rcu: Update list of rcu_future_grace_period() trace events rcu: Drop early GP request check from rcu_gp_kthread() rcu: Simplify and inline cpu_needs_another_gp() rcu: The rcu_gp_cleanup() function does not need cpu_needs_another_gp() rcu: Make rcu_start_this_gp() check for out-of-range requests rcu: Add funnel locking to rcu_start_this_gp() rcu: Make rcu_start_future_gp() caller select grace period rcu: Inline rcu_start_gp_advanced() into rcu_start_future_gp() rcu: Clear request other than RCU_GP_FLAG_INIT at GP end rcu: Cleanup, don't put ->completed into an int rcu: Switch __rcu_process_callbacks() to rcu_accelerate_cbs() rcu: Avoid __call_rcu_core() root rcu_node ->lock acquisition rcu: Make rcu_migrate_callbacks wake GP kthread when needed ...
| * Merge branch 'for-mingo' of ↵Ingo Molnar2018-05-161-1/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu - Updates to the handling of expedited grace periods, perhaps most notably parallelizing their initialization. Other changes include fixes from Boqun Feng. - Miscellaneous fixes. These include an nvme fix from Nitzan Carmi that I am carrying because it depends on a new SRCU function cleanup_srcu_struct_quiesced(). This branch also includes fixes from Byungchul Park and Yury Norov. - Updates to reduce lock contention in the rcu_node combining tree. These are in preparation for the consolidation of RCU-bh, RCU-preempt, and RCU-sched into a single flavor, which was requested by Linus Torvalds in response to a security flaw whose root cause included confusion between the multiple flavors of RCU. - Torture-test updates that save their users some time and effort. Conflicts: drivers/nvme/host/core.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * nvme: Avoid flush dependency in delete controller flowNitzan Carmi2018-05-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nvme_delete_ctrl() function queues a work item on a MEM_RECLAIM queue (nvme_delete_wq), which eventually calls cleanup_srcu_struct(), which in turn flushes a delayed work from an !MEM_RECLAIM queue. This is unsafe as we might trigger deadlocks under severe memory pressure. Since we don't ever invoke call_srcu(), it is safe to use the shiny new _quiesced() version of srcu cleanup, thus avoiding that flush dependency. This commit makes that change. Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Tested-by: Nicholas Piggin <npiggin@gmail.com>
* | | Merge tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-blockLinus Torvalds2018-06-041-45/+112
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: - clean up how we pass around gfp_t and blk_mq_req_flags_t (Christoph) - prepare us to defer scheduler attach (Christoph) - clean up drivers handling of bounce buffers (Christoph) - fix timeout handling corner cases (Christoph/Bart/Keith) - bcache fixes (Coly) - prep work for bcachefs and some block layer optimizations (Kent). - convert users of bio_sets to using embedded structs (Kent). - fixes for the BFQ io scheduler (Paolo/Davide/Filippo) - lightnvm fixes and improvements (Matias, with contributions from Hans and Javier) - adding discard throttling to blk-wbt (me) - sbitmap blk-mq-tag handling (me/Omar/Ming). - remove the sparc jsflash block driver, acked by DaveM. - Kyber scheduler improvement from Jianchao, making it more friendly wrt merging. - conversion of symbolic proc permissions to octal, from Joe Perches. Previously the block parts were a mix of both. - nbd fixes (Josef and Kevin Vigor) - unify how we handle the various kinds of timestamps that the block core and utility code uses (Omar) - three NVMe pull requests from Keith and Christoph, bringing AEN to feature completeness, file backed namespaces, cq/sq lock split, and various fixes - various little fixes and improvements all over the map * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits) blk-mq: update nr_requests when switching to 'none' scheduler block: don't use blocking queue entered for recursive bio submits dm-crypt: fix warning in shutdown path lightnvm: pblk: take bitmap alloc. out of critical section lightnvm: pblk: kick writer on new flush points lightnvm: pblk: only try to recover lines with written smeta lightnvm: pblk: remove unnecessary bio_get/put lightnvm: pblk: add possibility to set write buffer size manually lightnvm: fix partial read error path lightnvm: proper error handling for pblk_bio_add_pages lightnvm: pblk: fix smeta write error path lightnvm: pblk: garbage collect lines with failed writes lightnvm: pblk: rework write error recovery path lightnvm: pblk: remove dead function lightnvm: pass flag on graceful teardown to targets lightnvm: pblk: check for chunk size before allocating it lightnvm: pblk: remove unnecessary argument lightnvm: pblk: remove unnecessary indirection lightnvm: pblk: return NVM_ error on failed submission lightnvm: pblk: warn in case of corrupted write buffer ...
| * \ \ Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe2018-06-011-25/+90
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes from Christoph: "Below is another set of NVMe updates for 4.18. Besides the usual bug fixes this includes more feature completness in terms of AEN and log page handling on the target." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme: use the changed namespaces list log to clear ns data changed AENs nvme: mark nvme_queue_scan static nvme: submit AEN event configuration on startup nvmet: mask pending AENs nvmet: add AEN configuration support nvmet: implement the changed namespaces log nvmet: split log page implementation nvmet: add a new nvmet_zero_sgl helper nvme.h: add AEN configuration symbols nvme.h: add the changed namespace list log nvme.h: untangle AEN notice definitions nvmet: fix error return code in nvmet_file_ns_enable() nvmet: fix a typo in nvmet_file_ns_enable() nvme-fabrics: allow internal passthrough command on deleting controllers nvme-loop: add support for multiple ports nvme-pci: simplify __nvme_submit_cmd nvme-pci: Rate limit the nvme timeout warnings nvme: allow duplicate controller if prior controller being deleted
| | * | | nvme: use the changed namespaces list log to clear ns data changed AENsChristoph Hellwig2018-06-011-4/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per section 5.2 we need to issue the corresponding log page to clear an AEN, so for a namespace data changed AEN we need to read the changed namespace list log. And once we read that log anyway we might as well use it to optimize the rescan. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| | * | | nvme: mark nvme_queue_scan staticChristoph Hellwig2018-06-011-10/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And move it toward the top of the file to avoid a forward declaration. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| | * | | nvme: submit AEN event configuration on startupHannes Reinecke2018-06-011-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should register for AEN events; some law-abiding targets might not be sending us AENs otherwise. Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: slight cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
| | * | | nvme.h: untangle AEN notice definitionsChristoph Hellwig2018-05-311-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop including the event type in the definitions for the notice type. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| * | | | blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iterChristoph Hellwig2018-05-301-3/+0Star
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already check for started commands in all callbacks, but we should also protect against already completed commands. Do this by taking the checks to common code. Acked-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | nvme: fixup memory leak in nvme_init_identify()Hannes Reinecke2018-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If nvme_get_effects_log() failed the 'id' buffer from the previous nvme_identify_ctrl() call will never be freed. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | nvme-fabrics: allow duplicate connections to the discovery controllerHannes Reinecke2018-05-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole point of the discovery controller is that it can accept multiple connections. Additionally the cmic field is not even defined for the discovery controller identify page. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | | nvme: host: core: fix precedence of ternary operatorIvan Bornyakov2018-05-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ternary operator have lower precedence then bitwise or, so 'cdw10' was calculated wrong. Signed-off-by: Ivan Bornyakov <brnkv.i1@gmail.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Keith Busch <keith.busch@intel.com>
| * | | Merge branch 'nvme-4.18' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe2018-05-211-13/+17
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes from Keith: "This is just the first nvme pull request for 4.18. There are several fabrics and target patches that I missed, so there will be more to come." * 'nvme-4.18' of git://git.infradead.org/nvme: nvme-pci: drop IRQ disabling on submission queue lock nvme-pci: split the nvme queue lock into submission and completion locks nvme-pci: handle completions outside of the queue lock nvme-pci: move ->cq_vector == -1 check outside of ->q_lock nvme-pci: remove cq check after submission nvme-pci: simplify nvme_cqe_valid nvme: mark the result argument to nvme_complete_async_event volatile nvme/pci: Sync controller reset for AER slot_reset nvme/pci: Hold controller reference during async probe nvme: only reconfigure discard if necessary nvme/pci: Use async_schedule for initial reset work nvme: lightnvm: add granby support NVMe: Add Quirk Delay before CHK RDY for Seagate Nytro Flash Storage nvme: change order of qid and cmdid in completion trace nvme: fc: provide a descriptive error
| | * | | nvme: mark the result argument to nvme_complete_async_event volatileChristoph Hellwig2018-05-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need that in the PCIe driver soon as we'll read it straight off the CQ. Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * | | nvme: only reconfigure discard if necessaryJens Axboe2018-05-021-12/+16
| | | |/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently nvme reconfigures discard for every disk revalidation. This is problematic because any O_WRONLY or O_RDWR open will trigger a partition scan through udev/systemd, and we will reconfigure discard. This blows away any user settings, like discard_max_bytes. Only re-configure the user settable settings if we need to. Signed-off-by: Jens Axboe <axboe@kernel.dk> [removed redundant queue flag setting] Signed-off-by: Keith Busch <keith.busch@intel.com>