summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
...
| | * | | IB/core: Export ib_create/destroy_flow through uverbsHadar Hen Zion2013-08-283-1/+229
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to support flow steering for user space applications. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * | | IB/core: Infrastructure for extensible uverbs commandsIgor Ivanov2013-08-281-5/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add infrastructure to support extended uverbs capabilities in a forward/backward manner. Uverbs command opcodes which are based on the verbs extensions approach should be greater or equal to IB_USER_VERBS_CMD_THRESHOLD. They have new header format and processed a bit differently. Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means it needs to have additional arguments, we will be able to add them without creating a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version. This patch for itself doesn't provide the whole scheme which is also dependent on adding a comp_mask field to each extended uverbs command struct. The new header framework allows for future extension of the CMD arguments (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing new command (that is a command that supports the new uverbs command header format suggested in this patch) w/o bumping ABI version and with maintaining backward and formward compatibility to new and old libibverbs versions. In the uverbs command we are passing both uverbs arguments and the provider arguments. We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider input argument size. Same goes for the response (the uverbs CMD output argument). For example take the create_cq call and the mlx4_ib provider: The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and in_words = sizeof(mlx4_create_cq)/4 . Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes, where uverbs assumes it knows the size of its input argument - struct ibv_create_cq. Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field to the struct which is basically bit field indicating which fields exists in the struct (as done for the libibverbs API extension), but we need a way to tell what is the total size of the struct and not assume the struct size is predefined (since we may get different struct sizes from different user libibverbs versions). So we know at which point the provider input argument (struct mlx4_create_cq) begins. Same goes for extending the provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size. Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * | | IB/core: Add receive flow steering supportHadar Hen Zion2013-08-281-0/+27
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RDMA stack allows for applications to create IB_QPT_RAW_PACKET QPs, which receive plain Ethernet packets, specifically packets that don't carry any QPN to be matched by the receiving side. Applications using these QPs must be provided with a method to program some steering rule with the HW so packets arriving at the local port can be routed to them. This patch adds ib_create_flow(), which allow providing a flow specification for a QP. When there's a match between the specification and a received packet, the packet is forwarded to that QP, in a the same way one uses ib_attach_multicast() for IB UD multicast handling. Flow specifications are provided as instances of struct ib_flow_spec_yyy, which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4, TCP and UDP are defined. Flow specs are made of values and masks. The input to ib_create_flow() is a struct ib_flow_attr, which contains a few mandatory control elements and optional flow specs. struct ib_flow_attr { enum ib_flow_attr_type type; u16 size; u16 priority; u32 flags; u8 num_of_specs; u8 port; /* Following are the optional layers according to user request * struct ib_flow_spec_yyy * struct ib_flow_spec_zzz */ }; As these specs are eventually coming from user space, they are defined and used in a way which allows adding new spec types without kernel/user ABI change, just with a little API enhancement which defines the newly added spec. The flow spec structures are defined with TLV (Type-Length-Value) entries, which allows calling ib_create_flow() with a list of variable length of optional specs. For the actual processing of ib_flow_attr the driver uses the number of specs and the size mandatory fields along with the TLV nature of the specs. Steering rules processing order is according to the domain over which the rule is set and the rule priority. All rules set by user space applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains could be used by future IPoIB RFS and Ethetool flow-steering interface implementation. Lower numerical value for the priority field means higher priority. The returned value from ib_create_flow() is a struct ib_flow, which contains a database pointer (handle) provided by the HW driver to be used when calling ib_destroy_flow(). Applications that offload TCP/IP traffic can also be written over IB UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING to denote support for flow steering. The ib_flow_attr enum type supports usage of flow steering for promiscuous and sniffer purposes: IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive all Ethernet traffic which isn't steered to any QP IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Issue RI.FINI before closing when entering TERMSteve Wise2013-08-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Advertise ~0ULL as max MR sizeSteve Wise2013-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lustre uses a advertised max MR size of ~0ULL to indicate it should use a dma_mr. Hence advertise max MR size as ~0ULL. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Always do GTS write if cidx_inc == CIDXINC_MASKSteve Wise2013-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When polling, we do a GTS update if the accumulated cidx_inc == the CQ depth / 16. However, if the CQ is large enough, Cq depth / 16 exceeds the size of the field in the GTS word. So we also need to update if cidx_inc hits CIDXINC_MASK to avoid overflowing the field. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Set arp error handler for PASS_ACCEPT_RPL messagesSteve Wise2013-08-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | accept_cr() failed to set the arp error handler on a reused skb. This results in a kernel crash if the arp does indeed time out. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrapSteve Wise2013-08-131-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When determining how many WRs are completed with a signaled CQE, correctly deal with queue wraps. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Fix QP flush logicSteve Wise2013-08-134-136/+254
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Handle newer firmware changesSteve Wise2013-08-132-13/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move QP to TERMINATE instead to allow the peer to get the TERM message. This bug wasn't detectable until newer FW that moves connections out of RDMA mode as soon as an error is detected. QP can exit RTS before the last AE arrives. This was introduced by changes in the FW to kick connections out of RDMA mode as soon as an error is detected. A side effect of this is that the driver can move the QP out of RTS before the AE causing the connection to get kicked out of RDMA mode is processed. Fix for this is to always post async errors even if the QP is out of RTS. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Use correct bit shift macros for vlan filter tuplesSteve Wise2013-08-131-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cxgb4: Add support for active and passive open connection with IPv6 addressVipul Pandya2013-08-134-300/+636
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add new cpl messages, cpl_act_open_req6 and cpl_t5_act_open_req6, for initiating active open connections. Use LLD api cxgb4_create_server and cxgb4_create_server6 for initiating passive open connections. Similarly use cxgb4_remove_server to remove the passive open connections in place of listen_stop. Add support for iWARP over VLAN device and enable IPv6 support on VLAN device. Make use of import_ep in c4iw_reconnect. Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fix build when IPv6 is disabled and make sure iw_cxgb4 is not built-in when ipv6 is a module. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | RDMA/cma: Add IPv6 support for iWARPSteve Wise2013-08-126-148/+184
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify the type of local_addr and remote_addr fields in struct iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold IPv6 and IPv4 addresses uniformly. Change the references of local_addr and remote_addr in cxgb4, cxgb3, nes and amso drivers to match this. However to be able to actully run traffic over IPv6, low-level drivers have to add code to support this. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> [ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | Merge tag 'PTR_RET-for-linus' of ↵Linus Torvalds2013-09-051-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull PTR_RET() removal patches from Rusty Russell: "PTR_RET() is a weird name, and led to some confusing usage. We ended up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages. This has been sitting in linux-next for a whole cycle" [ There are still some PTR_RET users scattered about, with some of them possibly being new, but most of them existing in Rusty's tree too. We have that #define PTR_RET(p) PTR_ERR_OR_ZERO(p) thing in <linux/err.h>, so they continue to work for now - Linus ] * tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR(). staging/zcache: don't use PTR_RET(). remoteproc: don't use PTR_RET(). pinctrl: don't use PTR_RET(). acpi: Replace weird use of PTR_RET. s390: Replace weird use of PTR_RET. PTR_RET is now PTR_ERR_OR_ZERO(): Replace most. PTR_RET is now PTR_ERR_OR_ZERO
| * | | PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.Rusty Russell2013-07-151-1/+1
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sweep of the simple cases. Cc: netdev@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | [SCSI] IB/iser: Add Discovery supportOr Gerlitz2013-08-262-2/+12
| |/ |/| | | | | | | | | | | | | | | | | | | | | To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability towards user space. Also need to make sure the login RX buffer is posted when SendTargets TEXT PDUs are sent. For that end, we use a setting of the ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is discovery session. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
| |
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
| \
*---------------. \ Merge branches 'cma', 'cxgb3', 'cxgb4', 'ipoib', 'misc', 'mlx4', 'mlx5', ↵Roland Dreier2013-07-3115-28/+110
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'nes', 'ocrdma' and 'qib' into for-next
| | | | | | | | | * | IB/qib: Add err_decode() call for ring dumpMike Marciniszyn2013-07-302-1/+3
| | | | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0b3ddf380ca7 ("Log all SDMA errors unconditionally") missed part of the patch. This also corrects a format warning when dma_addr_t is 32 bits on a 64 bit system. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | * | RDMA/ocrdma: Fix several stack info leaksDan Carpenter2013-07-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A grab bag of places which don't properly initialize stack data. I removed one place which cleared ".rsvd" because it's not needed now that I have added a memset() earlier in the function. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | * | RDMA/ocrdma: Remove unused includeRoland Dreier2013-07-261-1/+0Star
| | | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'd like to remove rdma/ib_cache.h some day, so let's avoid proliferating uses of it unnecessarily. Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * | Revert "RDMA/nes: Fix compilation error when nes_debug is enabled"Roland Dreier2013-07-311-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bca1935ccdec, which removes variables nes_tcp_state_str and nes_iwarp_state_str, assuming that they aren't defined. However, they are defined within a #ifdef NES_DEBUG statement, which if enabled causes "defined but not used" compiler warning, when the variables are removed. Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * | RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq()Dan Carpenter2013-07-301-1/+2
| | | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We pass a few bytes of uninitialized stack memory to the user here. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | mlx5_core: Variable may be used uninitializedAndi Shyti2013-07-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the sq_overhead() function, if qp_typ is equal to IB_QPT_RC, size will be used uninitialized. Signed-off-by: Andi Shyti <andi@etezian.org> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()Dan Carpenter2013-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't set "resp.reserved". Since it's at the end of the struct that means we don't have to copy it to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Eli Cohen <eli@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | IB/mlx5: Fix error return code in init_one()Wei Yongjun2013-07-311-3/+5
| | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * / IB/mlx4: Use default pkey when creating tunnel QPsJack Morgenstein2013-07-311-2/+8
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating tunnel QPs for special QP tunneling, look for the default pkey in the slave's virtual pkey table. If it is present, use the real pkey index where the default pkey is located. If the default pkey is not found in the pkey table, use the real pkey index which is stored at index 0 in the slave's virtual pkey table (this is the current behavior). This change is required to support cloud computing, where the paravirtualized index of the default pkey is moved to index 1 or higher. The pkey at paravirtualized index 0 is used for the default IPoIB interface created by the VF. Its possible for the pkey value at paravirtualized index 0 to be invalid (zero) at VF probe time (pkey index 0 is mapped to real pkey index 127, which contains pkey = 0). At some point after the VF probe, the cloud computing interface at the hypervisor maps virtual index 0 for the VF to the pkey index containing the pkey that IPoIB will use in its operation. However, when the tunnel QP is created, the pkey at the slave's virtual index 0 is still mapped to the invalid pkey index, so tunnel QP creation fails. This commit causes the hypervisor to search for the default pkey in the slave's pkey table -- and this pkey is present in the table (at index > 0) at tunnel QP creation time, so that the tunnel QP creation will succeed. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | * / IB/core: Create QP1 using the pkey index which contains the default pkeyJack Morgenstein2013-07-311-1/+7
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, QP1 is created using pkey index 0. This patch simply looks for the index containing the default pkey, rather than hard-coding pkey index 0. This change will have no effect in native mode, since QP0 and QP1 are created before the SM configures the port, so pkey table will still be the default table defined by the IB Spec, in C10-123: "If non-volatile storage is not used to hold P_Key Table contents, then if a PM (Partition Manager) is not present, and prior to PM initialization of the P_Key Table, the P_Key Table must act as if it contains a single valid entry, at P_Key_ix = 0, containing the default partition key. All other entries in the P_Key Table must be invalid." Thus, in the native mode case, the driver will find the default pkey at index 0 (so it will be no different than the hard-coding). However, in SR-IOV mode, for VFs, the pkey table may be paravirtualized, so that the VF's pkey index zero may not necessarily be mapped to the real pkey index 0. For VFs, therefore, it is important to find the virtual index which maps to the real default pkey. This commit does the following for QP1 creation: 1. Find the pkey index containing the default pkey, and use that index if found. ib_find_pkey() returns the index of the limited-membership default pkey (0x7FFF) if the full-member default pkey is not in the table. 2. If neither form of the default pkey is found, use pkey index 0 (previous behavior). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * | IPoIB: Fix pkey change flow for virtualization environmentsErez Shitrit2013-07-311-13/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPoIB's required behaviour w.r.t to the pkey used by the device is the following: - For "parent" interfaces (e.g ib0, ib1, etc) who are created automatically as a result of hot-plug events from the IB core, the driver needs to take whatever pkey vlaue it finds in index 0, and stick to that index. - For child interfaces (e.g ib0.8001, etc) created by admin directive, the driver needs to use and stick to the value provided during its creation. In SR-IOV environment its possible for the VF probe to take place before the cloud management software provisions the suitable pkey for the VF in the paravirtualed PKEY table index 0. When this is the case, the VF IB stack will find in index 0 an invalide pkey, which is all zeros. Moreover, the cloud managment can assign the pkey value at index 0 at any time of the guest life cycle. The correct behavior for IPoIB to address these requirements for parent interfaces is to use PKEY_CHANGE event as trigger to optionally re-init the device pkey value and re-create all the relevant resources accordingly, if the value of the pkey in index 0 has changed (from invalid to valid or from valid value X to invalid value Y). This patch enhances the heavy flushing code which is triggered by pkey change event, to behave correctly for parent devices. For child devices, the code remains the same, namely chases pkey value and not index. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * | IPoIB: Make sure child devices use valid/proper pkeysOr Gerlitz2013-07-312-1/+10
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for child devices. Also, make sure to always set the full membership bit for the pkey of devices created by rtnl link ops. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * / RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()Dan Carpenter2013-07-301-0/+2
| | |/ | | | | | | | | | | | | | | | | | | | | | "uresp.ma_sync_key" doesn't get set on this path so we leak 8 bytes of data. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * / RDMA/cxgb3: Fix stack info leak in iwch_create_cq()Dan Carpenter2013-07-301-0/+1
| |/ | | | | | | | | | | | | | | | | The "uresp.reserved" field isn't initialized on this path so it could leak uninitialized stack information to the user. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | RDMA/cma: Only call cma_save_ib_info() for CM REQsSean Hefty2013-07-311-1/+2
| | | | | | | | | | | | | | | | Calling cma_save_ib_info() for CM SIDR REQs results in a crash accessing an invalid path record pointer. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | RDMA/cma: Fix accessing invalid private data for UDSean Hefty2013-07-311-8/+11
| | | | | | | | | | | | | | | | | | If a application is using AF_IB with a UD QP, but does not provide any private data, we will end up accessing invalid memory. Check for this case and handle it appropriately. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | RDMA/cma: Fix gcc warningPaul Bolle2013-07-311-4/+3Star
|/ | | | | | | | | | | | | | | | Building cma.o triggers this gcc warning: drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’: drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized] drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here This is a false positive, as "port" will always be initialized if we're at "found". But if we assign to "id_priv->id.port_num" directly, we can drop "port". That will, obviously, silence gcc. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-07-1347-786/+9961
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - AF_IB (native IB addressing) for CMA from Sean Hefty - new mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes) - SRP fixes from Bart Van Assche (including fix to first merge request) - qib HW driver updates - resurrection of ocrdma HW driver development - uverbs conversion to create fds with O_CLOEXEC set - other small changes and fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits) mlx5: Return -EFAULT instead of -EPERM IB/qib: Log all SDMA errors unconditionally IB/qib: Fix module-level leak mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd() mlx5_core: Fixes for sparse warnings IB/mlx5: Make profile[] static in main.c mlx5: Fix parameter type of health_handler_t mlx5: Add driver for Mellanox Connect-IB adapters IB/core: Add reserved values to enums for low-level driver use IB/srp: Bump driver version and release date IB/srp: Make HCA completion vector configurable IB/srp: Maintain a single connection per I_T nexus IB/srp: Fail I/O fast if target offline IB/srp: Skip host settle delay IB/srp: Avoid skipping srp_reset_host() after a transport error IB/srp: Fix remove_one crash due to resource exhaustion IB/qib: New transmitter tunning settings for Dell 1.1 backplane IB/core: Fix error return code in add_port() ...
| *---. Merge branches 'mlx5', 'qib' and 'srp' into for-nextRoland Dreier2013-07-126-10/+179
| |\ \ \
| | | | * IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offlineBart Van Assche2013-07-121-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the transport layer is offline it is more appropriate to let srp_abort() return FAST_IO_FAIL instead of SUCCESS. Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * | IB/qib: Log all SDMA errors unconditionallyDean Luick2013-07-123-1/+171
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds code to log SDMA errors for supportability purposes. Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * | IB/qib: Fix module-level leakMike Marciniszyn2013-07-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vzalloc()'ed field physshadow is leaked on module unload. This patch adds vfree after the sibling page shadow is freed. Reported-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * | | mlx5: Return -EFAULT instead of -EPERMDan Carpenter2013-07-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For copy_to/from_user() failure, the correct error code is -EFAULT not -EPERM. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | |
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| | \ \ \
| *---------. \ \ \ Merge branches 'af_ib', 'cxgb4', 'misc', 'mlx5', 'ocrdma', 'qib' and 'srp' ↵Roland Dreier2013-07-0842-355/+8962
| |\ \ \ \ \ \ \ \ \ | | | | | |_|_|/ / / | | | | |/| | | / / | | | | | | | |/ / | | | | | | |/| / | | | | | | | |/ into for-next
| | | | | | | * IB/srp: Bump driver version and release dateVu Pham2013-07-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Vu Pham <vu@mellanox.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Make HCA completion vector configurableBart Van Assche2013-07-012-2/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several InfiniBand HCAs allow configuring the completion vector per CQ. This allows spreading the workload created by IB completion interrupts over multiple MSI-X vectors and hence over multiple CPU cores. In other words, configuring the completion vector properly not only allows reducing latency on an initiator connected to multiple SRP targets but also allows improving throughput. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Maintain a single connection per I_T nexusBart Van Assche2013-07-011-2/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An SRP target is required to maintain a single connection between initiator and target. This means that if the 'add_target' attribute is used to create a second connection to a target, the first connection will be logged out and that the SCSI error handler will kick in. The SCSI error handler will cause the SRP initiator to reconnect, which will cause I/O over the second connection to fail. Avoid such ping-pong behavior by disabling relogins. If reconnecting manually is necessary, that is possible by deleting and recreating an rport via sysfs. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Fail I/O fast if target offlineBart Van Assche2013-07-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If reconnecting failed we know that no command completion will be received anymore. Hence let the SCSI error handler fail such commands immediately. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Skip host settle delayBart Van Assche2013-06-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SRP initiator implements host reset by reconnecting to the SRP target. That means that communication with the target is possible as soon as host reset finished. Hence skip the host settle delay. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Reviewed-by: Christoph Hellwig <hch@infradead.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Avoid skipping srp_reset_host() after a transport errorBart Van Assche2013-06-281-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SCSI error handler assumes that the transport layer is operational if an eh_abort_handler() returns SUCCESS. Hence srp_abort() only should return SUCCESS if sending the ABORT TASK task management function succeeded. This patch avoids the SCSI error handler skipping the srp_reset_host() call after a transport layer error. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/srp: Fix remove_one crash due to resource exhaustionDotan Barak2013-06-281-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the add_one callback fails during driver load no resources are allocated so there isn't a need to release any resources. Trying to clean the resource may lead to the following kernel panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] RIP: 0010:[<ffffffffa0132331>] [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp] Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0) Call Trace: [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core] [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp] [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com> Acked-by: David Dillow <dillowda@ornl.gov> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | IB/qib: New transmitter tunning settings for Dell 1.1 backplaneMitko Haralanov2013-06-261-22/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Dell blade chassis got an updated backplane which requires new transmitter tuning settings. Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | IB/qib: Add qp_stats debug fileMike Marciniszyn2013-06-223-1/+161
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a seq_file iterator for reporting the QP hash table when the qp_stats file is read. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>