summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/host/fc.c
Commit message (Collapse)AuthorAgeFilesLines
* nvme_fc: on remoteport reuse, set new nport_id and role.James Smart2018-03-261-0/+2
| | | | | | | | | | When reattaching to a removed remoteport that has not yet been fully deleted as it's waiting for reconnect timeouts, be sure to re-set the ports nport id and role. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme_fc: fix abort race on teardown with lld rejectJames Smart2018-03-261-1/+4
| | | | | | | | | | | | | | | | | | | | Another abort race: An io request is started, becomes active, and is attempted to be started with the lldd. At the same time the controller is stopped/torndown and an itterator is run to abort the ios. As the io is active, it is added to the outstanding aborted io count. However on the original io request thread, the driver ends up rejecting the io due to the condition that induced the controller teardown. The driver reject path didn't check whether it was in the outstanding io count. This left the count outstanding stopping controller teardown. Correct by, in the driver reject case, setting the state to inactive and checking whether it was in the outstanding io count. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme_fc: io timeout should defer abort to ctrl resetJames Smart2018-03-261-11/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | The current nvme_fc code, when an io times out, will abort the io on the fc link, then call the error recovery routine to reset the controller. It is during the reset of the controller that the transport will wait for all ios to be aborted before sending a Disconnect LS to the target. However, the reset routine only waits for the io which it generates the abort for to complete. Any io that was aborted just prior to the reset isn't in it's list to wait for. Thus the Disconnect is getting sent before the aborts have completed. Correct by removing the abort in the timeout handler. The reset will generate the abort. At that point the timeout handler can be simplified to request the reset (via the error handler) and restart the timeout timer. Also fixes a small typo in a comment in the reset handler. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme_fc: fix ctrl create failures racing with workq itemsJames Smart2018-03-261-0/+4
| | | | | | | | | | | | | | | | | | | If there are errors during initial controller create, the transport will teardown the partially initialized controller struct and free the ctlr memory. Trouble is - most of those errors can occur due to asynchronous events happening such io timeouts and subsystem connectivity failures. Those failures invoke async workq items to reset the controller and attempt reconnect. Those may be in progress as the main thread frees the ctrl memory, resulting in NULL ptr oops. Prevent this from happening by having the main ctrl failure thread changing state to DELETING followed by synchronously cancelling any pending queued work item. The change of state will prevent the scheduling of resets or reconnect events. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme: centralize ctrl removal printsMax Gurtovoy2018-03-261-8/+5Star
| | | | | | | | | | | | nvme_delete_ctrl can be called from various contexts in parallel, and cause duplicated information prints, even though the specific context doesn't perform the actual removal. Instead, print the information when the actual removal occurs. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme_fc: cleanup io completionJames Smart2018-02-111-51/+12Star
| | | | | | | | | | | | | | | | | | There was some old cold that dealt with complete_rq being called prior to the lldd returning the io completion. This is garbage code. The complete_rq routine was being called after eh_timeouts were called and it was due to eh_timeouts not being handled properly. The timeouts were fixed in prior patches so that in general, a timeout will initiate an abort and the reset timer restarted as the abort operation will take care of completing things. Given the reset timer restarted, the erroneous complete_rq calls were eliminated. So remove the work that was synchronizing complete_rq with io completion. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* nvme_fc: correct abort race condition on resetsJames Smart2018-02-111-72/+26Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | During reset handling, there is live io completing while the reset is taking place. The reset path attempts to abort all outstanding io, counting the number of ios that were reset. It then waits for those ios to be reclaimed from the lldd before continuing. The transport's logic on io state and flag setting was poor, allowing ios to complete simultaneous to the abort request. The completed ios were counted, but as the completion had already occurred, the completion never reduced the count. As the count never zeros, the reset/delete never completes. Tighten it up by unconditionally changing the op state to completed when the io done handler is called. The reset/abort path now changes the op state to aborted, but the abort only continues if the op state was live priviously. If complete, the abort is backed out. Thus proper counting of io aborts and their completions is working again. Also removed the TERMIO state on the op as it's redundant with the op's aborted state. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTINGMax Gurtovoy2018-02-081-7/+7
| | | | | | | | | In pci transport, this state is used to mark the initialization process. This should be also used in other transports as well. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
* blk-mq: introduce BLK_STS_DEV_RESOURCEMing Lei2018-01-311-10/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This status is returned from driver to block layer if device related resource is unavailable, but driver can guarantee that IO dispatch will be triggered in future when the resource is available. Convert some drivers to return BLK_STS_DEV_RESOURCE. Also, if driver returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls. BLK_MQ_DELAY_QUEUE is 3 ms because both scsi-mq and nvmefc are using that magic value. If a driver can make sure there is in-flight IO, it is safe to return BLK_STS_DEV_RESOURCE because: 1) If all in-flight IOs complete before examining SCHED_RESTART in blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue is run immediately in this case by blk_mq_dispatch_rq_list(); 2) if there is any in-flight IO after/when examining SCHED_RESTART in blk_mq_dispatch_rq_list(): - if SCHED_RESTART isn't set, queue is run immediately as handled in 1) - otherwise, this request will be dispatched after any in-flight IO is completed via blk_mq_sched_restart() 3) if SCHED_RESTART is set concurently in context because of BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two cases and make sure IO hang can be avoided. One invariant is that queue will be rerun if SCHED_RESTART is set. Suggested-by: Jens Axboe <axboe@kernel.dk> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-blockLinus Torvalds2018-01-291-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "This is the main pull request for block IO related changes for the 4.16 kernel. Nothing major in this pull request, but a good amount of improvements and fixes all over the map. This contains: - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and Paolo. - Support for SMR zones for deadline and mq-deadline from Damien and Christoph. - Set of fixes for bcache by way of Michael Lyle, including fixes from himself, Kent, Rui, Tang, and Coly. - Series from Matias for lightnvm with fixes from Hans Holmberg, Javier, and Matias. Mostly centered around pblk, and the removing rrpc 1.2 in preparation for supporting 2.0. - A couple of NVMe pull requests from Christoph. Nothing major in here, just fixes and cleanups, and support for command tracing from Johannes. - Support for blk-throttle for tracking reads and writes separately. From Joseph Qi. A few cleanups/fixes also for blk-throttle from Weiping. - Series from Mike Snitzer that enables dm to register its queue more logically, something that's alwways been problematic on dm since it's a stacked device. - Series from Ming cleaning up some of the bio accessor use, in preparation for supporting multipage bvecs. - Various fixes from Ming closing up holes around queue mapping and quiescing. - BSD partition fix from Richard Narron, fixing a problem where we can't mount newer (10/11) FreeBSD partitions. - Series from Tejun reworking blk-mq timeout handling. The previous scheme relied on atomic bits, but it had races where we would think a request had timed out if it to reused at the wrong time. - null_blk now supports faking timeouts, to enable us to better exercise and test that functionality separately. From me. - Kill the separate atomic poll bit in the request struct. After this, we don't use the atomic bits on blk-mq anymore at all. From me. - sgl_alloc/free helpers from Bart. - Heavily contended tag case scalability improvement from me. - Various little fixes and cleanups from Arnd, Bart, Corentin, Douglas, Eryu, Goldwyn, and myself" * 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits) block: remove smart1,2.h nvme: add tracepoint for nvme_complete_rq nvme: add tracepoint for nvme_setup_cmd nvme-pci: introduce RECONNECTING state to mark initializing procedure nvme-rdma: remove redundant boolean for inline_data nvme: don't free uuid pointer before printing it nvme-pci: Suspend queues after deleting them bsg: use pr_debug instead of hand crafted macros blk-mq-debugfs: don't allow write on attributes with seq_operations set nvme-pci: Fix queue double allocations block: Set BIO_TRACE_COMPLETION on new bio during split blk-throttle: use queue_is_rq_based block: Remove kblockd_schedule_delayed_work{,_on}() blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly() lib/scatterlist: Fix chaining support in sgl_alloc_order() blk-throttle: track read and write request individually block: add bdev_read_only() checks to common helpers block: fail op_is_write() requests to read-only partitions blk-throttle: export io_serviced_recursive, io_service_bytes_recursive ...
| * nvme-fc: correct hang in nvme_ns_remove()James Smart2018-01-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When connectivity is lost to a device, the association is terminated and the blk-mq queues are quiesced/stopped. When connectivity is re-established, they are resumed. If connectivity is lost for a sufficient amount of time that the controller is then deleted, the delete path starts tearing down queues, and eventually calling nvme_ns_remove(). It appears that pending commands may cause blk_cleanup_queue() to never complete and the teardown stalls. Correct by starting the ns queues after transitioning to a DELETING state, allowing pending commands to be flushed with io failures. Thus the delete path is clear when reached. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: fix rogue admin cmds stalling teardownJames Smart2018-01-171-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When connectivity is lost to a device, the association is terminated and the blk-mq queues are quiesced/stopped. When connectivity is re-established, they are resumed. If an admin command is received while connectivity is list, the ioctl queues the command on the admin_q and the command stalls (the thread issuing the ioctl hangs/waits). if the connectivity is lost long enough such that the controller is then deleted, the delete code makes its calls to initiate the delete, which then expects the core layer to call the transport when all references are removed and the controller can be freed. Unfortunately, nothing in this path dequeued the admin command, so a reference sits outstanding and things stop, hanging the delete indefinitely. Correct by unquiescing the admin queue in the delete association. This means any admin command (which should only be from an ioctl) issued after connectivity is lost will detect the controller is in a reconnecting state and will (fast) fail the command. Thus, a pending reference can no longer be created. Once connectivity is re-established, a new ioctl/admin command would see proper device state and function again. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fabrics: protect against module unload during create_ctrlRoy Shterman2018-01-081-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVMe transport driver module unload may (and usually does) trigger iteration over the active controllers and delete them all (sometimes under a mutex). However, a controller can be created concurrently with module unload which can lead to leakage of resources (most important char device node leakage) in case the controller creation occured after the unload delete and drain sequence. To protect against this, we take a module reference to guarantee that the nvme transport driver is not unloaded while creating a controller. Signed-off-by: Roy Shterman <roys@lightbitslabs.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme-fc: remove double put reference if admin connect failsJames Smart2017-12-151-1/+0Star
|/ | | | | | | | | | | There are two put references in the failure case of initial create_association. The first put actually frees the controller, thus the second put references freed memory. Remove the unnecessary 2nd put. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: don't use bit masks for set/test_bit() numbersJens Axboe2017-11-241-2/+2
| | | | | | | | So far harmless, but it's confusing and a bug waiting to happen if the shifts grow larger than 4. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-fc: check if queue is ready in queue_rqSagi Grimberg2017-11-201-1/+18
| | | | | | | | | | | | In case the queue is not LIVE (fully functional and connected at the nvmf level), we cannot allow any commands other than connect to pass through. Add a new queue state flag NVME_FC_Q_LIVE which is set after nvmf connect and cleared in queue teardown. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-11-151-201/+592
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
| * nvme: remove handling of multiple AEN requestsKeith Busch2017-11-111-6/+3Star
| | | | | | | | | | | | | | | | | | | | | | The driver can handle tracking only one AEN request, so this patch removes handling for multiple ones. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme-fc: remove unused "queue_size" fieldKeith Busch2017-11-111-6/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | This was being saved in a structure, but never used anywhere. The queue size is obtained through other means, so there's no reason to duplicate this without a user for it. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme: centralize AEN definesKeith Busch2017-11-111-21/+12Star
| | | | | | | | | | | | | | | | | | | | All the transports were unnecessarilly duplicating the AEN request accounting. This patch defines everything in one place. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Guan Junxiong <guanjunxiong@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme-fc: decouple ns references from lldd referencesJames Smart2017-11-111-6/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the lldd api, a lldd may unregister a remoteport (loss of connectivity or driver unload) or localport (driver unload). The lldd must wait for the remoteport_delete or localport_delete before completing its actions post the unregister. The xxx_deletes currently occur only when the xxxport structure is fully freed after all references are removed. Thus the lldd may be held hostage until an app or in-kernel entity that has a namespace open finally closes so the namespace can be removed, the controller removed, thus the transport objects, thus the lldd. This patch decouples the transport and os-facing objects from the lldd and the remoteport and localport. There is a point in all deletions where the transport will no longer interact with the lldd on behalf of a controller. That point centers around the association established with the target/subsystem. It will access the lldd whenever it attempts to create an association and while the association is active. New associations may only be created if the remoteport is live (thus the localport is live). It will not access the lldd after deleting the association. Therefore, the patch tracks the count of active controllers - those with associations being created or that are active - on a remoteport. It also tracks the number of remoteports that have active controllers, on a a localport. When a remoteport is unregistered, as soon as there are no active controllers, the lldd's remoteport_delete may be called and the lldd may continue. Similarly, when a localport is unregistered, as soon as there are no remoteports with active controllers, the localport_delete callback may be made. This significantly speeds up unregistration with the lldd. The transport objects continue in suspended status with reconnect timers running, and upon expiration, normal ref-counting will occur and the objects will be freed. The transport object may still be held hostage by the application/kernel module, but that is acceptable. With this change, the lldd may be fully unloaded and reloaded, and if registrations occur prior to the timeouts, the nvme controller and namespaces will resume normally as if a link bounce. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme-fc: fix localport resume using stale valuesJames Smart2017-11-111-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | The localport resume was not updating the lldd ops structure. If the lldd is unloaded and reloaded, the ops pointers will differ. Additionally, as there are device references taken by the localport, ensure that resume only resumes if the device matches as well. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * nvme-fc: add dev_loss_tmo timeout and remoteport resume supportJames Smart2017-11-011-39/+239
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a remoteport is unregistered (connectivity lost), the following actions are taken: - the remoteport is marked DELETED - the time when dev_loss_tmo would expire is set in the remoteport - all controllers on the remoteport are reset. After a controller resets, it will stall in a RECONNECTING state waiting for one of the following: - the controller will continue to attempt reconnect per max_retries and reconnect_delay. As no remoteport connectivity, the reconnect attempt will immediately fail. If max reconnects has not been reached, a new reconnect_delay timer will be schedule. If the current time plus another reconnect_delay exceeds when dev_loss_tmo expires on the remote port, then the reconnect_delay will be shortend to schedule no later than when dev_loss_tmo expires. If max reconnect attempts are reached (e.g. ctrl_loss_tmo reached) or dev_loss_tmo ix exceeded without connectivity, the controller is deleted. - the remoteport is re-registered prior to dev_loss_tmo expiring. The resume of the remoteport will immediately attempt to reconnect each of its suspended controllers. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> [hch: updated to use nvme_delete_ctrl] Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: check connectivity before initiating reconnectsJames Smart2017-11-011-7/+16
| | | | | | | | | | | | | | | | | | Check remoteport connectivity before initiating reconnects Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: add a dev_loss_tmo field to the remoteportJames Smart2017-11-011-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a dev_loss_tmo value, paralleling the SCSI FC transport, for device connectivity loss. The transport initializes the value in the nvme_fc_register_remoteport() call. If the value is not set, a default of 60s is set. Add a new routine to the api, nvme_fc_set_remoteport_devloss() routine, which allows the lldd to dynamically update the value on an existing remoteport. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: change ctlr state assignments during reset/reconnectJames Smart2017-11-011-15/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up some of the controller state checks and add the RESETTING->RECONNECTING state transition. Specifically: - the movement of the RESETTING state change and schedule of reset_work to core doesn't work wiht nvme_fc_error_recovery setting state to RECONNECTING before attempting to reset. Remove the state change as the reset request does it. - In the rare cases where an error occurs right as we're transitioning to LIVE, defer the controller start actions. - In error handling on teardown of associations while performing initial controller creation - avoid quiesce calls on the admin_q. They are unneeded. - Add the RESETTING->RECONNECTING transition in the reset handler. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: flush reset_work before safely continuing with delete operationSagi Grimberg2017-11-011-1/+0Star
| | | | | | | | | | | | | | | | | | Prevent racing controller reset and delete flows. reset_work must not ever self-requeue so flushing it suffices. Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: consolidate common code from ->reset_workChristoph Hellwig2017-11-011-13/+0Star
| | | | | | | | | | | | | | | | | | | | No change in behavior except that the FC code cancels two work items a little later now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: move controller deletion to common codeChristoph Hellwig2017-11-011-37/+5Star
| | | | | | | | | | | | | | | | | | | | | | Move the ->delete_work and the associated helpers to common code instead of duplicating them in every driver. This also adds the missing reference get/put for the loop driver. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: merge __nvme_fc_schedule_delete_work into __nvme_fc_del_ctrlChristoph Hellwig2017-11-011-14/+6Star
| | | | | | | | | | | | | | | | | | No need to have two functions doing the same thing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: avoid workqueue flush stallsJames Smart2017-11-011-1/+1
| | | | | | | | | | | | | | | | | | There's no need to wait for the full nvme_wq, which is now shared, to flush. flush only the delete_work item. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sgi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: remove NVME_FC_MAX_SEGMENTSJames Smart2017-10-271-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | The define is an arbitrary limit to the io size on the initiator, capping the io to 1MB-4KB. Remove the define from the transport. I/O size will solely be limited by the LLDD sg limits. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: add support for duplicate_connect optionJames Smart2017-10-271-0/+33
| | | | | | | | | | | | | | | | | | | | | | Adds support for the duplicate_connect option. When set to true, checks whether there's an existing controller via the same host port and target port for the same host (hostnqn, hostid) to the same subsystem. Fails the connection request if an existing controller. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: switch controller refcounting to use struct deviceChristoph Hellwig2017-10-271-6/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of allocating a separate struct device for the character device handle embedd it into struct nvme_ctrl and use it for the main controller refcounting. This removes double refcounting and gets us an automatic reference for the character device operations. We keep ctrl->device as a pointer for now to avoid chaning printks all over, but in the future we could look into message printing helpers that take a controller structure similar to what other subsystems do. Note the delete_ctrl operation always already has a reference (either through sysfs due this change, or because every open file on the /dev/nvme-fabrics node has a refernece) when it is entered now, so we don't need to do the unless_zero variant there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com>
| * nvme-fc: correct io timeout behaviorJames Smart2017-10-201-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The transport io timeout behavior wasn't quite correct. It ignored that the io error handler is supposed to be synchronous so it possibly allowed the blk request to be restarted while the io associated was still aborting. Timeouts on reserved commands, those used for association create, were never timing out thus they hung out forever. To correct: If an io is times out while a remoteport is not connected, just restart the io timer. The lack of connectivity will simultaneously be resetting the controller, so the reset path will abort and terminate the io. If an io is times out while it was marked for transport abort, just reset the io timer. The abort process is underway and will complete the io. Otherwise, if an io times out, abort the io. If the abort was unsuccessful (unlikely) give up and return not handled. If the abort was successful, as the abort process is underway it will terminate the io, so rather than synchronously waiting, just restart the io timer. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: correct io termination handlingJames Smart2017-10-201-13/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The io completion handling for i/o's that are failing due to to a transport error or association termination had issues, causing io failures (DNR set so retries didn't kick in) or long stalls. Change the io completion handler for the following items: When an io has been completed due to a transport abort (based on an exchange error) or when marked as aborted as part of an association termination (FCOP_FLAGS_TERMIO), set the NVME completion status to NVME_SC_ABORTED. By default, do not set DNR on the status so that a retry can be attempted after association recreate. In cases where an io is failed (non-successful nvme status including aborted), if the controller is being deleted (blk_queue_dying) or the io was part of the ios used for association creation (ctrl state is NEW or RECONNECTING), then additionally set the DNR bit so the io will not be retried. If the failed io was part of association creation, the failure will tear down the partially completioned association and typically restart a new reconnect attempt (another create association later). Rearranged code flow to remove a largely unneeded local variable. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: Add BLK_MQ_F_NO_SCHED flag to admin tag setIsrael Rukshin2017-10-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit b86dd81 "block: get rid of blk-mq default scheduler choice Kconfig entries", when setting nr_hw_queues to 1 the admin tag set uses mq-deadline scheduler. This flag is useful for admin queues that aren't used for normal IO. Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme: introduce nvme_reinit_tagsetSagi Grimberg2017-10-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move blk_mq_reinit_tagset from blk-mq to nvme core as the only user of it. Current transports that use it (rdma, fc) simply implement .reinit_request op. This patch does not change any functionality. Reviewed-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: move remote port get/put/free locationJames Smart2017-10-051-39/+39
| | | | | | | | | | | | | | | | move nvme_fc_rport_get/put and rport free to higher in the file to avoid adding prototypes to resolve references in upcoming code additions Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: create fc class and transport deviceJames Smart2017-10-041-1/+54
| | | | | | | | | | | | | | | | | | | | Added a new fc class and a device node for udev events under it. I expect the fc class will eventually be the location where the FC SCSI and FC NVME merge in the future. Therefore names are kept somewhat generic. Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvme-fc: add uevent for auto-connectJames Smart2017-10-041-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To support auto-connecting to FC-NVME devices upon their dynamic appearance, add a uevent that can kick off connection scripts. uevent is posted against the fc_udev device. patch set tested with the following rule to kick an nvme-cli connect-all for the FC initiator and FC target ports. This is just an example for testing and not intended for real life use. ACTION=="change", SUBSYSTEM=="fc", ENV{FC_EVENT}=="nvmediscovery", \ ENV{NVMEFC_HOST_TRADDR}=="*", ENV{NVMEFC_TRADDR}=="*", \ RUN+="/bin/sh -c '/usr/local/sbin/nvme connect-all --transport=fc --host-traddr=$env{NVMEFC_HOST_TRADDR} --traddr=$env{NVMEFC_TRADDR} >> /tmp/nvme_fc.log'" I will post proposed udev/systemd scripts for possible kernel support. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme-fc: retry initial controller connections 3 timesJames Smart2017-10-181-2/+30
| | | | | | | | | | | | | | | | | | | | | | Currently, if a frame is lost of command fails as part of initial association create for a new controller, the new controller connection request will immediately fail. Add in an immediate 3 retry loop before giving up. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvme-fc: fix iowait hangJames Smart2017-10-181-2/+3
|/ | | | | | | | | | | | | Add missing iowait head initialization. Fix irqsave vs irq: wait_event_lock_irq() doesn't do irq save/restore Fixes: 36715cf4b366 ("nvme_fc: replace ioabort msleep loop with completion”) Cc: <stable@vger.kernel.org> # 4.13 Signed-off-by: James Smart <james.smart@broadcom.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com> Tested-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: use transport-specific sgl formatJames Smart2017-09-251-6/+7
| | | | | | | | | | | | | | | | | Sync with NVM Express spec change and FC-NVME 1.18. FC transport sets SGL type to Transport SGL Data Block Descriptor and subtype to transport-specific value 0x0A. Removed the warn-on's on the PRP fields. They are unneeded. They were to check for values from the upper layer that weren't set right, and for the most part were fine. But, with Async events, which reuse the same structure and 2nd time issued the SGL overlay converted them to the Transport SGL values - the warn-on's were errantly firing. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-fc: remove use of FC-specific error codesJames Smart2017-09-251-4/+4
| | | | | | | | | | The FC-NVME transport used the FC-specific error codes in cases where it had to fabricate an error to go back up stack. Instead of using the FC-specific values, now use a generic value (NVME_SC_INTERNAL). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-fc: Reattach to localports on re-registrationJames Smart2017-08-281-38/+106
| | | | | | | | | | | | | | | | | | | | | | | | | | | If the LLDD resets or detaches from an fc port, the LLDD will deregister all remoteports seen by the fc port and deregister the localport associated with the fc port. The teardown of the localport structure will be held off due to reference counting until all the remoteports are removed (and they are held off until all controllers/associations to terminated). Currently, if the fc port is reinit/reattached and registered again as a localport it is treated as an independent entity from the prior localport and all prior remoteports and controllers cannot be revived. They are created as new and separate entities. This patch changes the localport registration to look at the known localports that are waiting to be torndown. If they are the same port based on wwn's, the local port is transitioned out of the teardown state. This allows the remote ports and controller connections to be reestablished and resumed as long as the localport can also be reregistered within the timeout windows. The patch adds a new routine nvme_fc_attach_to_unreg_lport() with the functionality and moves the lport get/put routines to avoid forward references. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: Add admin_tagset pointer to nvme_ctrlSagi Grimberg2017-08-281-0/+1
| | | | | | | Will be used when we centralize control flows. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* blk-mq: Make blk_mq_reinit_tagset() calls easier to readBart Van Assche2017-08-181-3/+1Star
| | | | | | | | | | | | | | | | Since blk_mq_ops.reinit_request is only called from inside blk_mq_reinit_tagset(), make this function pointer an argument of blk_mq_reinit_tagset() instead of a member of struct blk_mq_ops. This patch does not change any functionality but makes blk_mq_reinit_tagset() calls easier to read and to analyze. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: James Smart <james.smart@broadcom.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-fc: revise TRADDR parsingJames Smart2017-07-251-49/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The FC-NVME spec hasn't locked down on the format string for TRADDR. Currently the spec is lobbying for "nn-<16hexdigits>:pn-<16hexdigits>" where the wwn's are hex values but not prefixed by 0x. Most implementations so far expect a string format of "nn-0x<16hexdigits>:pn-0x<16hexdigits>" to be used. The transport uses the match_u64 parser which requires a leading 0x prefix to set the base properly. If it's not there, a match will either fail or return a base 10 value. The resolution in T11 is pushing out. Therefore, to fix things now and to cover any eventuality and any implementations already in the field, this patch adds support for both formats. The change consists of replacing the token matching routine with a routine that validates the fixed string format, and then builds a local copy of the hex name with a 0x prefix before calling the system parser. Note: the same parser routine exists in both the initiator and target transports. Given this is about the only "shared" item, we chose to replicate rather than create an interdendency on some shared code. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: address target disconnect race conditions in fcp io submitJames Smart2017-07-251-8/+11
| | | | | | | | | | | | | | | There are cases where threads are in the process of submitting new io when the LLDD calls in to remove the remote port. In some cases, the next io actually goes to the LLDD, who knows the remoteport isn't present and rejects it. To properly recovery/restart these i/o's we don't want to hard fail them, we want to treat them as temporary resource errors in which a delayed retry will work. Add a couple more checks on remoteport connectivity and commonize the busy response handling when it's seen. Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>