summaryrefslogtreecommitdiffstats
path: root/drivers/nvme/target
Commit message (Collapse)AuthorAgeFilesLines
* nvmet: use strcmp() instead of strncmp() for subsystem lookupBart Van Assche2018-10-171-2/+1Star
| | | | | | | | | | | | | | strncmp() stops comparing when either the end of one of the first two arguments is reached or when 'n' characters have been compared, whichever comes first. That means that strncmp(s1, s2, n) is equivalent to strcmp(s1, s2) if n exceeds the length of s1 or the length of s2. Since that is the case in nvmet_find_get_subsys(), change strncmp() into strcmp(). This patch avoids that the following warning is reported by smatch: drivers/nvme/target/core.c:940:1 nvmet_find_get_subsys() error: strncmp() '"nqn.2014-08.org.nvmexpress.discovery"' too small (37 vs 223) Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: remove unreachable codeChaitanya Kulkarni2018-10-171-3/+1Star
| | | | | | | | Get rid of the unreachable code in the nvmet_parse_discovery_cmd(). Keep the error message identical to the admin-cmd.c and io-cmd*.c Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet-rdma: use a private workqueue for deleteSagi Grimberg2018-10-051-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queue deletion is done asynchronous when the last reference on the queue is dropped. Thus, in order to make sure we don't over allocate under a connect/disconnect storm, we let queue deletion complete before making forward progress. However, given that we flush the system_wq from rdma_cm context which runs from a workqueue context, we can have a circular locking complaint [1]. Fix that by using a private workqueue for queue deletion. [1]: ====================================================== WARNING: possible circular locking dependency detected 4.19.0-rc4-dbg+ #3 Not tainted ------------------------------------------------------ kworker/5:0/39 is trying to acquire lock: 00000000a10b6db9 (&id_priv->handler_mutex){+.+.}, at: rdma_destroy_id+0x6f/0x440 [rdma_cm] but task is already holding lock: 00000000331b4e2c ((work_completion)(&queue->release_work)){+.+.}, at: process_one_work+0x3ed/0xa20 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 ((work_completion)(&queue->release_work)){+.+.}: process_one_work+0x474/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #2 ((wq_completion)"events"){+.+.}: flush_workqueue+0xf3/0x970 nvmet_rdma_cm_handler+0x133d/0x1734 [nvmet_rdma] cma_ib_req_handler+0x72f/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: disconnected (10): status 0 id 0000000040357082 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 nvme nvme0: Reconnecting in 10 seconds... -> #1 (&id_priv->handler_mutex/1){+.+.}: __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 cma_ib_req_handler+0x6aa/0xf90 [rdma_cm] cm_process_work+0x2e/0x110 [ib_cm] cm_req_handler+0x135b/0x1c30 [ib_cm] cm_work_handler+0x2b7/0x38cd [ib_cm] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 (&id_priv->handler_mutex){+.+.}: lock_acquire+0xc5/0x200 __mutex_lock+0xfe/0xbe0 mutex_lock_nested+0x1b/0x20 rdma_destroy_id+0x6f/0x440 [rdma_cm] nvmet_rdma_release_queue_work+0x8e/0x1b0 [nvmet_rdma] process_one_work+0x4ae/0xa20 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 Fixes: 777dc82395de ("nvmet-rdma: occasionally flush ongoing controller teardown") Reported-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: don't split large I/Os unconditionallySagi Grimberg2018-10-012-2/+8
| | | | | | | | | | If we know that the I/O size exceeds our inline bio vec, no point using it and split the rest to begin with. We could in theory reuse the inline bio and only allocate the bio_vec, but its really not worth optimizing for. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet_fc: support target port removal with nvmet layerJames Smart2018-10-011-8/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if a targetport has been connected to via the nvmet config (in other words, the add_port() transport routine called, and the nvmet port pointer stored for using in upcalls on new io), and if the targetport is then removed (say the lldd driver decides to unload or fully reset its hardware) and then re-added (the lldd driver reloads or reinits its hardware), the port pointer has been lost so there's no way to continue to post commands up to nvmet via the transport port. Correct by allocating a small "port context" structure that will be linked to by the targetport. The context will save the targetport WWN's and the nvmet port pointer to use for it. Initial allocation will occur when the targetport is bound to via add_port. The context will be deallocated when remove_port() is called. If a targetport is removed while nvmet has the active port context, the targetport will be unlinked from the port context before removal. If a new targetport is registered, the port contexts without a binding are looked through and if the WWN's match (so it's the same as nvmet's port context) the port context is linked to the new target port. Thus new io can be received on the new targetport and operation resumes with nvmet. Additionally, this also resolves nvmet configuration changing out from underneath of the nvme-fc target port (for example: a nvmetcli clear). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fc: fix for a minor typosMilan P. Gandhi2018-10-011-1/+1
| | | | | | Signed-off-by: Milan P. Gandhi <mgandhi@redhat.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: remove redundant module prefixChaitanya Kulkarni2018-10-011-1/+1
| | | | | | | | | This patch removes the redundant module prefix used in the pr_err() when nvmet_get_smart_log_nsid() failed to find the namespace provided as a part of smart-log command. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme: count all ANA groups for ANA Log pageHannes Reinecke2018-09-171-0/+4
| | | | | | | | | | When issuing a short read on the ANA log page the number of groups should not change, even though the final returned data might contain less groups than that number. Signed-off-by: Hannes Reinecke <hare@suse.com> [switched to a for loop] Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet-rdma: fix possible bogus dereference under heavy loadSagi Grimberg2018-09-051-2/+25
| | | | | | | | | | | | | | | | | | | | | | Currently we always repost the recv buffer before we send a response capsule back to the host. Since ordering is not guaranteed for send and recv completions, it is posible that we will receive a new request from the host before we got a send completion for the response capsule. Today, we pre-allocate 2x rsps the length of the queue, but in reality, under heavy load there is nothing that is really preventing the gap to expand until we exhaust all our rsps. To fix this, if we don't have any pre-allocated rsps left, we dynamically allocate a rsp and make sure to free it when we are done. If under memory pressure we fail to allocate a rsp, we silently drop the command and wait for the host to retry. Reported-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> [hch: dropped a superflous assignment] Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: free workqueue object if module init failsChaitanya Kulkarni2018-08-281-1/+3
| | | | | Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-fcloop: Fix dropped LS's to removed target portJames Smart2018-08-281-1/+2
| | | | | | | | | | | | | | | | When a targetport is removed from the config, fcloop will avoid calling the LS done() routine thinking the targetport is gone. This leaves the initiator reset/reconnect hanging as it waits for a status on the Create_Association LS for the reconnect. Change the filter in the LS callback path. If tport null (set when failed validation before "sending to remote port"), be sure to call done. This was the main bug. But, continue the logic that only calls done if tport was set but there is no remoteport (e.g. case where remoteport has been removed, thus host doesn't expect a completion). Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* Merge branch 'linus/master' into rdma.git for-nextJason Gunthorpe2018-08-169-79/+846
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rdma.git merge resolution for the 4.19 merge window Conflicts: drivers/infiniband/core/rdma_core.c - Use the rdma code and revise with the new spelling for atomic_fetch_add_unless drivers/nvme/host/rdma.c - Replace max_sge with max_send_sge in new blk code drivers/nvme/target/rdma.c - Use the blk code and revise to use NULL for ib_post_recv when appropriate - Replace max_sge with max_recv_sge in new blk code net/rds/ib_send.c - Use the net code and revise to use NULL for ib_post_recv when appropriate Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * Merge tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-blockLinus Torvalds2018-08-149-79/+845
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block updates from Jens Axboe: "First pull request for this merge window, there will also be a followup request with some stragglers. This pull request contains: - Fix for a thundering heard issue in the wbt block code (Anchal Agarwal) - A few NVMe pull requests: * Improved tracepoints (Keith) * Larger inline data support for RDMA (Steve Wise) * RDMA setup/teardown fixes (Sagi) * Effects log suppor for NVMe target (Chaitanya Kulkarni) * Buffered IO suppor for NVMe target (Chaitanya Kulkarni) * TP4004 (ANA) support (Christoph) * Various NVMe fixes - Block io-latency controller support. Much needed support for properly containing block devices. (Josef) - Series improving how we handle sense information on the stack (Kees) - Lightnvm fixes and updates/improvements (Mathias/Javier et al) - Zoned device support for null_blk (Matias) - AIX partition fixes (Mauricio Faria de Oliveira) - DIF checksum code made generic (Max Gurtovoy) - Add support for discard in iostats (Michael Callahan / Tejun) - Set of updates for BFQ (Paolo) - Removal of async write support for bsg (Christoph) - Bio page dirtying and clone fixups (Christoph) - Set of bcache fix/changes (via Coly) - Series improving blk-mq queue setup/teardown speed (Ming) - Series improving merging performance on blk-mq (Ming) - Lots of other fixes and cleanups from a slew of folks" * tag 'for-4.19/block-20180812' of git://git.kernel.dk/linux-block: (190 commits) blkcg: Make blkg_root_lookup() work for queues in bypass mode bcache: fix error setting writeback_rate through sysfs interface null_blk: add lock drop/acquire annotation Blk-throttle: reduce tail io latency when iops limit is enforced block: paride: pd: mark expected switch fall-throughs block: Ensure that a request queue is dissociated from the cgroup controller block: Introduce blk_exit_queue() blkcg: Introduce blkg_root_lookup() block: Remove two superfluous #include directives blk-mq: count the hctx as active before allocating tag block: bvec_nr_vecs() returns value for wrong slab bcache: trivial - remove tailing backslash in macro BTREE_FLAG bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section bcache: set max writeback rate when I/O request is idle bcache: add code comments for bset.c bcache: fix mistaken comments in request.c bcache: fix mistaken code comments in bcache.h bcache: add a comment in super.c bcache: avoid unncessary cache prefetch bch_btree_node_get() bcache: display rate debug parameters to 0 when writeback is not running ...
| | * nvmet: add ns write protect supportChaitanya Kulkarni2018-08-085-5/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the Namespace Write Protect feature described in "NVMe TP 4005a Namespace Write Protect". In this version, we implement No Write Protect and Write Protect states for target ns which can be toggled by set-features commands from the host side. For write-protect state transition, we need to flush the ns specified as a part of command so we also add helpers for carrying out synchronous flush operations. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: fixed an incorrect endianess conversion, minor cleanups] Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: use Retain Async Event bit to clear AENChaitanya Kulkarni2018-07-271-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, we clear the AEN bit when we get the "get log page" command if given log page is associated with AEN. This patch allows optionally retaining the AEN for the ctrl under consideration when Retain Asynchronous Event (RAE) bit is set as a part of "get log page" command. This allows the host to read the Log page and optionally retaining the AEN associated with this log page when using userspace tools like nvme-cli. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> [hch: also use the new helper in the just merged ANA code] Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: support configuring ANA groupsChristoph Hellwig2018-07-274-4/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow creating non-default ANA groups (group ID > 1). Groups are created either by assigning the group ID to a namespace, or by creating a configfs group object under a specific port. All namespaces assigned to a group that doesn't have a configfs object for a given port are marked as inaccessible. Allow changing the ANA state on a per-port basis by creating an ana_groups directory under each port, and another directory with an ana_state file in it. The default ANA group 1 directory is created automatically for each port. For all changes in ANA configuration the ANA change AEN is sent. We only keep a global changecount instead of additional per-group changecounts to keep the implementation as simple as possible. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| | * nvmet: add minimal ANA supportChristoph Hellwig2018-07-274-4/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for Asynchronous Namespace Access as specified in NVMe 1.3 TP 4004. Just add a default ANA group 1 that is optimized on all ports. This is (and will remain) the default assignment for any namespace not epxlicitly assigned to another ANA group. The ANA state can be manually changed through the configfs interface, including the change state. Includes fixes and improvements from Hannes Reinecke. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| | * nvmet: track and limit the number of namespaces per subsystemChristoph Hellwig2018-07-273-1/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TP 4004 introduces a new 'Maximum Number of Allocated Namespaces' field in the Identify controller data to help the host size resources. Put an upper limit on the supported namespaces to be able to support this value as supporting 32-bits worth of namespaces would lead to very large buffers. The limit is completely arbitrary at this point. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| | * nvmet: keep a port pointer in nvmet_ctrlChristoph Hellwig2018-07-272-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be needed for the ANA AEN code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
| | * nvmet: don't use uuid_le typeAndy Shevchenko2018-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't use sizeof(uuid_le) where none of the parameters is type of uuid_le. Since both arguments are u8 [16], use size of destination there. Moreover, uuid_le is a deprecated type, and nvmet is using uuid_t already. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: check fileio lba range access boundariesSagi Grimberg2018-07-241-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fail out-of-bounds with a proper status code. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: fix file discard return statusSagi Grimberg2018-07-241-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If nvmet_copy_from_sgl failed, we falsly return successful completion status. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvme: cache struct nvme_ctrl reference to struct nvme_requestSagi Grimberg2018-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We will need to reference the controller in the setup and completion time for tracing and future traffic based keep alive support. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet-rdma: add an error flow for post_recv failuresMax Gurtovoy2018-07-231-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Posting receive buffer operation can fail, thus we should make sure to have an error flow during initialization phase. While we're here, add a debug print in case of a failure. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet-rdma: add unlikely check in the fast pathMax Gurtovoy2018-07-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | ib_post_send operation should succeed unless something unusual happened to the ib device. Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet-rdma: support max(16KB, PAGE_SIZE) inline dataSteve Wise2018-07-236-43/+169
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch enables inline data sizes using up to 4 recv sges, and capping the size at 16KB or at least 1 page size. So on a 4K page system, up to 16KB is supported, and for a 64K page system 1 page of 64KB is supported. We avoid > 0 order page allocations for the inline buffers by using multiple recv sges, one for each page. If the device cannot support the configured inline data size due to lack of enough recv sges, then log a warning and reduce the inline size. Add a new configfs port attribute, called param_inline_data_size, to allow configuring the size of inline data for a given nvmf port. The maximum size allowed is still enforced by nvmet-rdma with NVMET_RDMA_MAX_INLINE_DATA_SIZE, which is now max(16KB, PAGE_SIZE). And the default size, if not specified via configfs, is still PAGE_SIZE. This preserves the existing behavior, but allows larger inline sizes for small page systems. If the configured inline data size exceeds NVMET_RDMA_MAX_INLINE_DATA_SIZE, a warning is logged and the size is reduced. If param_inline_data_size is set to 0, then inline data is disabled for that nvmf port. Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: add buffered I/O support for file backed nsChaitanya Kulkarni2018-07-234-5/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new "buffered_io" attribute, which disabled direct I/O and thus enables page cache based caching when enabled. The attribute can only be changed when the namespace is disabled as the file has to be reopend for the change to take effect. The possibly blocking read/write are deferred to a newly introduced global workqueue. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| | * nvmet: add commands supported and effects log pageChaitanya Kulkarni2018-07-231-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for Commands Supported and Effects log page (Log Identifier 05h) for NVMeOF. This also makes it easier to find which commands are supported, e.g. :- subnqn : testnqn1 Admin Command Set ACS2 [Get Log Page ] 00000001 ACS6 [Identify ] 00000001 ACS8 [Abort ] 00000001 ACS9 [Set Features ] 00000001 ACS10 [Get Features ] 00000001 ACS12 [Asynchronous Event Request ] 00000001 ACS24 [Keep Alive ] 00000001 NVM Command Set IOCS0 [Flush ] 00000001 IOCS1 [Write ] 00000001 IOCS2 [Read ] 00000001 IOCS8 [Write Zeroes ] 00000001 IOCS9 [Dataset Management ] 00000001 This partticular functionality can be used from the host side to examine the NVMeOF ctrl commands supported. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | Merge tag 'v4.18' into rdma.git for-nextJason Gunthorpe2018-08-164-13/+52
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve merge conflicts from the -rc cycle against the rdma.git tree: Conflicts: drivers/infiniband/core/uverbs_cmd.c - New ifs added to ib_uverbs_ex_create_flow in -rc and for-next - Merge removal of file->ucontext in for-next with new code in -rc drivers/infiniband/core/uverbs_main.c - for-next removed code from ib_uverbs_write() that was modified in for-rc Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
| * | nvmet: only check for filebacking on -ENOTBLKHannes Reinecke2018-07-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to check for a file-backed namespace if nvmet_bdev_ns_enable() returns -ENOTBLK. For any other error it's pointless as the open() error will remain the same. Fixes: d5eff33e ("nvmet: add simple file backed ns support") Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvmet: fixup crash on NULL device pathHannes Reinecke2018-07-251-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When writing an empty string into the device_path attribute the kernel will crash with nvmet: failed to open block device (null): (-22) BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 This patch sanitizes the error handling for invalid device path settings. Fixes: a07b4970 ("nvmet: add a generic NVMe target") Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * | nvme: if_ready checks to fail io to deleting controllerJames Smart2018-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The revised if_ready checks skipped over the case of returning error when the controller is being deleted. Instead it was returning BUSY, which caused the ios to retry, which caused the ns delete to hang waiting for the ios to drain. Stack trace of hang looks like: kworker/u64:2 D 0 74 2 0x80000000 Workqueue: nvme-delete-wq nvme_delete_ctrl_work [nvme_core] Call Trace: ? __schedule+0x26d/0x820 schedule+0x32/0x80 blk_mq_freeze_queue_wait+0x36/0x80 ? remove_wait_queue+0x60/0x60 blk_cleanup_queue+0x72/0x160 nvme_ns_remove+0x106/0x140 [nvme_core] nvme_remove_namespaces+0x7e/0xa0 [nvme_core] nvme_delete_ctrl_work+0x4d/0x80 [nvme_core] process_one_work+0x160/0x350 worker_thread+0x1c3/0x3d0 kthread+0xf5/0x130 ? process_one_work+0x350/0x350 ? kthread_bind+0x10/0x10 ret_from_fork+0x1f/0x30 Extend nvmf_fail_nonready_command() to supply the controller pointer so that the controller state can be looked at. Fail any io to a controller that is deleting. Fixes: 3bc32bb1186c ("nvme-fabrics: refactor queue ready check") Fixes: 35897b920c8a ("nvme-fabrics: fix and refine state checks in __nvmf_check_ready") Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com>
| * | nvmet-fc: fix target sgl list on large transfersJames Smart2018-07-241-9/+35
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code to carve up the sg list expected an sg element-per-page which can be very incorrect with iommu's remapping multiple memory pages to fewer bus addresses. To hit this error required a large io payload (greater than 256k) and a system that maps on a per-page basis. It's possible that large ios could get by fine if the system condensed the sgl list into the first 64 elements. This patch corrects the sg list handling by specifically walking the sg list element by element and attempting to divide the transfer up on a per-sg element boundary. While doing so, it still tries to keep sequences under 256k, but will exceed that rule if a single sg element is larger than 256k. Fixes: 48fa362b6c3f ("nvmet-fc: simplify sg list handling") Cc: <stable@vger.kernel.org> # 4.14 Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
| * nvmet: reset keep alive timer in controller enableMax Gurtuvoy2018-06-201-0/+8
| | | | | | | | | | | | | | | | | | | | | | Controllers that are not yet enabled should not really enforce keep alive timeouts, but we still want to track a timeout and cleanup in case a host died before it enabled the controller. Hence, simply reset the keep alive timer when the controller is enabled. Suggested-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | nvmet-rdma: Simplify ib_post_(send|recv|srq_recv)() callsBart Van Assche2018-07-251-6/+4Star
| | | | | | | | | | | | | | | | | | | | Instead of declaring and passing a dummy 'bad_wr' pointer, pass NULL as third argument to ib_post_(send|recv|srq_recv)(). Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* | IB/core: add max_send_sge and max_recv_sge attributesSteve Wise2018-06-181-1/+1
|/ | | | | | | | | | | | | | | | | | | This patch replaces the ib_device_attr.max_sge with max_send_sge and max_recv_sge. It allows ulps to take advantage of devices that have very different send and recv sge depths. For example cxgb4 has a max_recv_sge of 4, yet a max_send_sge of 16. Splitting out these attributes allows much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW API. Consider a large RDMA WRITE that has 16 scattergather entries. With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of 16, it can be done with 1 WRITE WR. Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
* nvme-fabrics: refactor queue ready checkChristoph Hellwig2018-06-151-4/+3Star
| | | | | | | | | | | | | Move the is_connected check to the fibre channel transport, as it has no meaning for other transports. To facilitate this split out a new nvmf_fail_nonready_command helper that is called by the transport when it is asked to handle a command on a queue that is not ready. Also avoid a function call for the queue live fast path by inlining the check. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: James Smart <james.smart@broadcom.com>
* nvmet: free smart-log buffer after useChaitanya Kulkarni2018-06-111-1/+3
| | | | | | | Free smart-log buffer allocated in the function after use. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: filter newlines from user inputSagi Grimberg2018-06-081-5/+9
| | | | | | | | | | We should avoid consuming the newlines in traddr, trsvcid and device_path. Add minimal processing to make sure they are gone. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvmet: return all zeroed buffer when we can't find an active namespaceChristoph Hellwig2018-06-081-6/+9
| | | | | | | | | | | | | | | | Quote from Figure 106 in NVMe 1.3a: The Identify Namespace data structure is returned to the host for the namespace specified in the Namespace Identifier (CDW1.NSID) field if it is an active NSID. If the specified namespace is not an active NSID, then the controller returns a zero filled data structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@rimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvmet: mask pending AENsChristoph Hellwig2018-06-013-1/+10
| | | | | | | | | | | | | | Per section 5.2 of the NVMe 1.3 spec: "When the controller posts a completion queue entry for an outstanding Asynchronous Event Request command and thus reports an asynchronous event, subsequent events of that event type are automatically masked by the controller until the host clears that event. An event is cleared by reading the log page associated with that event using the Get Log Page command (see section 5.14)." Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
* nvmet: add AEN configuration supportChristoph Hellwig2018-06-013-2/+32
| | | | | | | | | | | AEN configuration via the 'Get Features' and 'Set Features' admin command is mandatory, so we should be implemeting handling for it. Signed-off-by: Hannes Reinecke <hare@suse.com> [hch: use WRITE_ONCE, check for invalid values] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
* nvmet: implement the changed namespaces logChristoph Hellwig2018-06-013-9/+76
| | | | | | | | | Just keep a per-controller buffer of changed namespaces and copy it out in the get log page implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com>
* nvmet: split log page implementationChristoph Hellwig2018-06-011-63/+36Star
| | | | | | | | | | | Remove the common code to allocate a buffer and copy it into the SGL. Instead the two no-op implementations just zero the SGL directly, and the smart log allocates a buffer on its own. This prepares for the more elaborate ANA log page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
* nvmet: add a new nvmet_zero_sgl helperChristoph Hellwig2018-06-012-0/+8
| | | | | | | | Zeroes the SGL in the payload. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
* nvmet: fix error return code in nvmet_file_ns_enable()Wei Yongjun2018-05-311-2/+6
| | | | | | | | | | Fix to return error code -ENOMEM from the memory alloc fail error handling case instead of 0, as done elsewhere in this function. Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.e> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvmet: fix a typo in nvmet_file_ns_enable()Wei Yongjun2018-05-311-1/+1
| | | | | | | | | Fix a typo in nvmet_file_ns_enable(). Fixes: d5eff33ee6f8 ("nvmet: add simple file backed ns support") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.e> Signed-off-by: Christoph Hellwig <hch@lst.de>
* nvme-loop: add support for multiple portsChristoph Hellwig2018-05-302-19/+31
| | | | | | | This is useful at least for multipath testing. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
* Merge branch 'nvme-4.18-2' of git://git.infradead.org/nvme into for-4.18/blockJens Axboe2018-05-2910-67/+417
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe changes from Christoph: "Here is the current batch of nvme updates for 4.18, we have a few more patches in the queue, but I'd like to get this pile into your tree and linux-next ASAP. The biggest item is support for file-backed namespaces in the NVMe target from Chaitanya, in addition to that we mostly small fixes from all the usual suspects." * 'nvme-4.18-2' of git://git.infradead.org/nvme: nvme: fixup memory leak in nvme_init_identify() nvme: fix KASAN warning when parsing host nqn nvmet-loop: use nr_phys_segments when map rq to sgl nvmet-fc: increase LS buffer count per fc port nvmet: add simple file backed ns support nvmet: remove duplicate NULL initialization for req->ns nvmet: make a few error messages more generic nvme-fabrics: allow duplicate connections to the discovery controller nvme-fabrics: centralize discovery controller defaults nvme-fabrics: remove unnecessary controller subnqn validation nvme-fc: remove setting DNR on exception conditions nvme-rdma: stop admin queue before freeing it nvme-pci: Fix AER reset handling nvme-pci: set nvmeq->cq_vector after alloc cq/sq nvme: host: core: fix precedence of ternary operator nvme: fix lockdep warning in nvme_mpath_clear_current_path
| * nvmet-loop: use nr_phys_segments when map rq to sglChaitanya Kulkarni2018-05-251-1/+1
| | | | | | | | | | | | | | | | | | | | Use blk_rq_nr_phys_segments() instead of blk_rq_payload_bytes() to check if a command contains data to me mapped. This fixes the case where a struct requests contains LBAs, but no data will actually be send, e.g. the pending Write Zeroes support. Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de>