summaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | IB/mlx4: Increase maximal message size under UD QPMark Bloch2017-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maximal message should be used as a limit to the max message payload allowed, without the headers. The ConnectX-3 check is done against this value includes the headers. When the payload is 4K this will cause the NIC to drop packets. Increase maximal message to 8K as workaround, this shouldn't change current behaviour because we continue to set the MTU to 4k. To reproduce; set MTU to 4296 on the corresponding interface, for example: ifconfig eth0 mtu 4296 (both server and client) On server: ib_send_bw -c UD -d mlx4_0 -s 4096 -n 1000000 -i1 -m 4096 On client: ib_send_bw -d mlx4_0 -c UD <server_ip> -s 4096 -n 1000000 -i 1 -m 4096 Fixes: 6e0d733d9215 ("IB/mlx4: Allow 4K messages for UD QPs") Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx4: Add contig support for control objectsGuy Levi2017-11-134-7/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Taking advantage of the optimization which was introduced in previous commit ("IB/mlx4: Use optimal numbers of MTT entries") to optimize the MTT usage for QP and CQ. Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx4: Use optimal numbers of MTT entriesGuy Levi2017-11-131-24/+261
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize the device performance by assigning multiple physical pages, which are contiguous, to a single MTT. As a result, the number of MTTs is reduced and in turn save cache misses of MTTs. Signed-off-by: Guy Levi <guyle@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx5: Fix RoCE Address Path fieldsMajd Dibbiny2017-11-131-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When working over a RoCE network, the UDP source port should be set only for statically connected QPs (RC, UC and XRC). Fixes: 2811ba51b049 ("IB/mlx5: Add RoCE fields to Address Vector") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx5: Assign send CQ and recv CQ of UMR QPMajd Dibbiny2017-11-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UMR's QP is created by calling mlx5_ib_create_qp directly, and therefore the send CQ and the recv CQ on the ibqp weren't assigned. Assign them right after calling the mlx5_ib_create_qp to assure that any access to those pointers will work as expected and won't crash the system as might happen as part of reset flow. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/cxgb4: Protect from possible dereferenceLeon Romanovsky2017-11-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Smatch tool reports the following error: drivers/infiniband/hw/cxgb4/qp.c:1886 c4iw_create_qp() error: we previously assumed 'ucontext' could be null (see line 1804) Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/bnxt_re: Remove unused vlan_tag variableLeon Romanovsky2017-11-131-5/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Broadcom driver produces the following compilation warning drivers/infiniband/hw/bnxt_re/ib_verbs.c: In function ‘bnxt_re_create_ah’: drivers/infiniband/hw/bnxt_re/ib_verbs.c:668:6: warning: variable ‘vlan_tag’ set but not used [-Wunused-but-set-variable] u16 vlan_tag; Let's remove it till vlan_tag will be implemented properly. Cc: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx5: Add PCI write end padding supportNoa Osherovich2017-11-103-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the PCI write end padding flag to device_cap_flags enum and set it during mlx5_ib_query_device so it will be reported to user-space. During WQ/QP creation, set that capability for WQ/QP if user requested it and HW supports it. PCI write end padding modification is not supported for now. There's no such flag for a QP but for a WQ, create and modify use the same flag. Return an error if PCI write end padding flag is set during modify_wq. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/core: Add PCI write end padding flags for WQ and QPNoa Osherovich2017-11-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are root complexes that are able to optimize their performance when incoming data is multiple full cache lines. PCI write end padding is the device's ability to pad the ending of incoming packets (scatter) to full cache line such that the last upstream write generated by an incoming packet will be a full cache line. Add a relevant entry to ib_device_cap_flags to report such capability of an RDMA device. Add the QP and WQ create flags: * A QP/WQ created with a scatter end padding flag will cause HW to pad the last upstream write generated by a packet to cache line. User should consider several factors before activating this feature: - In case of high CPU memory load (which may cause PCI back pressure in turn), if a large percent of the writes are partial cache line, this feature should be checked as an optional solution. - This feature might reduce performance if most packets are between one and two cache lines and PCIe throughput has reached its maximum capacity. E.g. 65B packet from the network port will lead to 128B write on PCIe, which may cause traffic on PCIe to reach high throughput. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/rxe: don't crash, if allocation of crc algorithm failedThomas Bogendoerfer2017-11-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following crash happens, if crc algorithm couldn't be allocated: [ 1087.989072] rdma_rxe: loaded [ 1097.855397] PCLMULQDQ-NI instructions are not detected. [ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2 [ 1097.901248] BUG: unable to handle kernel [ 1097.901249] NULL pointer dereference [ 1097.901250] at 0000000000000046 [...] Reason is that rxe->tfm is assigned the error return, which will then be used for crypto_free_shash() in rxe_cleanup. Fix by using a temporary variable and assigning it rxe->tfm after allocation succeeded. Fixes: cee2688e3cd6 ("IB/rxe: Offload CRC calculation when possible") Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/core: Avoid crash on pkey enforcement failed in received MADsParav Pandit2017-11-101-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Below kernel crash is observed when Pkey security enforcement fails on received MADs. This issue is reported in [1]. ib_free_recv_mad() accesses the rmpp_list, whose initialization is needed before accessing it. When security enformcent fails on received MADs, MAD processing avoided due to security checks failed. OpenSM[3770]: SM port is down kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: IP: ib_free_recv_mad+0x44/0xa0 [ib_core] kernel: PGD 0 kernel: P4D 0 kernel: kernel: Oops: 0002 [#1] SMP kernel: CPU: 0 PID: 2833 Comm: kworker/0:1H Tainted: P IO 4.13.4-1-pve #1 kernel: Hardware name: Dell XS23-TY3 /9CMP63, BIOS 1.71 09/17/2013 kernel: Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] kernel: task: ffffa069c6541600 task.stack: ffffb9a729054000 kernel: RIP: 0010:ib_free_recv_mad+0x44/0xa0 [ib_core] kernel: RSP: 0018:ffffb9a729057d38 EFLAGS: 00010286 kernel: RAX: ffffa069cb138a48 RBX: ffffa069cb138a10 RCX: 0000000000000000 kernel: RDX: ffffb9a729057d38 RSI: 0000000000000000 RDI: ffffa069cb138a20 kernel: RBP: ffffb9a729057d60 R08: ffffa072d2d49800 R09: ffffa069cb138ae0 kernel: R10: ffffa069cb138ae0 R11: ffffa072b3994e00 R12: ffffb9a729057d38 kernel: R13: ffffa069d1c90000 R14: 0000000000000000 R15: ffffa069d1c90880 kernel: FS: 0000000000000000(0000) GS:ffffa069dba00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000011f51f2000 CR4: 00000000000006f0 kernel: Call Trace: kernel: ib_mad_recv_done+0x5cc/0xb50 [ib_core] kernel: __ib_process_cq+0x5c/0xb0 [ib_core] kernel: ib_cq_poll_work+0x20/0x60 [ib_core] kernel: process_one_work+0x1e9/0x410 kernel: worker_thread+0x4b/0x410 kernel: kthread+0x109/0x140 kernel: ? process_one_work+0x410/0x410 kernel: ? kthread_create_on_node+0x70/0x70 kernel: ? SyS_exit_group+0x14/0x20 kernel: ret_from_fork+0x25/0x30 kernel: RIP: ib_free_recv_mad+0x44/0xa0 [ib_core] RSP: ffffb9a729057d38 kernel: CR2: 0000000000000008 [1] : https://www.spinics.net/lists/linux-rdma/msg56190.html Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Cc: stable@vger.kernel.org # 4.13+ Signed-off-by: Parav Pandit <parav@mellanox.com> Reported-by: Chris Blake <chrisrblake93@gmail.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/cxgb4: Annotate r2 and stag as __be32Leon Romanovsky2017-11-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chelsio cxgb4 HW is big-endian, hence there is need to properly annotate r2 and stag fields as __be32 and not __u32 to fix the following sparse warnings. drivers/infiniband/hw/cxgb4/qp.c:614:16: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] r2 got restricted __be32 [usertype] <noident> drivers/infiniband/hw/cxgb4/qp.c:615:18: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] stag got restricted __be32 [usertype] <noident> Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx4: Fix RSS's QPC attributes assignmentsGuy Levi2017-11-101-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the modify QP handler the base_qpn_udp field in the RSS QPC is overwrite later by irrelevant value assignment. Hence, ingress packets which gets to the RSS QP will be steered then to a garbage QPN. The patch fixes this by skipping the above assignment when a RSS QP is modified, also, the RSS context's attributes assignments are relocated just before the context is posted to avoid future issues like this. Additionally, this patch takes the opportunity to change the code to be disciplined to the device's manual and assigns the RSS QP context just at RESET to INIT transition. Fixes:3078f5f1bd8b ("IB/mlx4: Add support for RSS QP") Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | IB/mlx4: Add report for RSS capabilities by vendor channelGuy Levi2017-11-102-5/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mlx4's RSS patches submission missed a report of RSS capabilities which should be reported by the vendor channel in query_device. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/umem: Avoid partial declaration of non-static functionLeon Romanovsky2017-11-103-110/+73Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RDMA/umem uses generic RB-trees macros to generate various ib_umem access functions. The generation is performed with INTERVAL_TREE_DEFINE macro, which allows one of two modes: declare all functions as static or declare none of the function to be static. The second mode of operation produces the following sparse errors: drivers/infiniband/core/umem_rbtree.c:69:1: warning: symbol 'rbt_ib_umem_iter_first' was not declared. Should it be static? drivers/infiniband/core/umem_rbtree.c:69:1: warning: symbol 'rbt_ib_umem_iter_next' was not declared. Should it be static? Code relocation together with declaration of such functions to be "static" solves the issue. Because there is no need to have separate file for two functions, let's consolidate umem_rtree.c and umem_odp.c into one file. Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Modify the usage of cmd_sn in hip08oulijun2017-11-103-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cmd_sn field of CQ doorbell inits for 0. It should be increment on each first db rung after a completion Event. if the cmd_sn of notify doorbell Adjacent two times is the same, the hardware will distinguish it for the same notify request and update its type according to the priority level of next event and solicited event. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Unify the calculation for hem index in hip08oulijun2017-11-101-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The calculation of hem index are different between hns_roce_table_get and hns_roce_table_find. When the table chunk size of TRRL is not divisible by object size, it will faile to find the trrl table. This patch is to update the calculation of the hem index in the hns_roce_table_find to the same as which in the hns_roce_table_get. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Set the owner field of SQWQE in hip08 RoCEoulijun2017-11-101-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the owner need to be set when posting sqwqe in hip08 RoCE. The owner be used according to the below algorithm: The value of owner should be 1 in the first lap, it should be 0 in the second lap and in turn. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Add sq_invld_flg field in QP contextoulijun2017-11-102-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In hip08 RoCE, it need to add the sq_invld_flg field in QP context for RoCE hardware. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Update the usage of ack timeout in hip08oulijun2017-11-101-7/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ack timeout's value in qp context shall be a 5-bit value and be assgined by users. When at of qpc is set zero, the timer is disabled. When attr_mask set for IB_QP_TIMEOUT, The ack timeout field is effective. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Set sq_cur_sge_blk_addr field in QPC in hip08oulijun2017-11-102-1/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the extend sges exist, the sq_cur_sge_blk_addr field in QPC (qp context) should be configured. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Enable the cqe field of sqwqe of RCoulijun2017-11-101-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sig_type of qpc is non-selectable, all sq's wqes will produce cqe and not depend on the cqe attribute of wqe. When sig_type of qpc is selectable, The cqe attribute of wqe will decide whether to produce the cqe. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Set se attribute of sqwqe in hip08oulijun2017-11-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When send flags is IB_SEND_SOLICITED, the se(solicated event) field of sqwqe will be set. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Configure fence attribute in hip08 RoCEoulijun2017-11-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When post wr for mixed rdma operation, we need to use fence mechanism to keep the correct execute order. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Configure TRRL field in hip08 RoCE deviceoulijun2017-11-107-2/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TRRL(Target RDMA Read/aTOMIC List) record the information of receiving RDMA READ or ATOMIC operation in hip08. It will be used the hardware. The driver need to assign a continuous physical address for trrl_ba field of qp context. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Update calculation of irrl_ba field for hip08oulijun2017-11-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The irrl(initiator RDMA Read/Atomic list) base address of qp context is assigned for addr[63:6]. This patch mainly fixed it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Configure sgid type for hip08 RoCEWei Hu(Xavier)2017-11-105-12/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hardware vendors need to generate RoCEv1 or RoCEv2 packet according to the sgid type configured. Besides, update the gid table size for hip08 RoCE device. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Generate gid type of RoCEv2Wei Hu(Xavier)2017-11-103-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HNS_ROCE_CAP_FALG_ROCE_V1_V2 is added for selecting capability of RoCE in hns driver. When HNS_ROCE_CAP_FALG_ROCE_V1_V2 is set, driver will inform ib core that the related hns device can support RoCEv2, and ib core can generate the gid of the related type. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | RDMA/hns: Add rereg mr support for hip08Wei Hu(Xavier)2017-11-105-0/+195
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds rereg mr support for hip08. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | Merge branch 'k.o/for-rc' into k.o/for-nextDoug Ledford2017-11-011-1/+12
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | Pick up the missing netlink oops fix Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Take advantage of kvzalloc_node in sdma initializationMike Marciniszyn2017-10-301-7/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code that allocates the tx ring in the sdma code fails to take advantage of kvzalloc variations. Fix by converting to use kvzalloc_node. Reported-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Don't modify num_user_contexts module parameterKamenee Arumugam2017-10-301-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver parameter num_user_contexts controls global behavior and should not be modified by the driver. This patch eliminates modification of num_user_contexts by using a local variable to keep track of the value. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Insure int mask for in-kernel receive contexts is clearMike Marciniszyn2017-10-301-3/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only use for the urg interrupt is for priority PSM packets. There is no reason for this interrupt to be enabled for kernel contexts. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Add tx_opcode_stats like the opcode_statsMike Marciniszyn2017-10-304-10/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds tx_opcode_stats to parallel the (rx)opcode_stats in the debugfs. Reviewed-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Validate PKEY for incoming GSI MAD packetsSebastian Sanchez2017-10-301-2/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are the use-cases where the pkey needs to be tested to see if a packet needs to be dropped. a) Check if pkey is not FULL_MGMT_P_KEY or LIM_MGMT_P_KEY, drop the packet as it's not part of the management partition. Self-originated packets are an exception. b) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is in the table, the packet is coming from a management node, and the receiving node is also a management node, so it is safe for the packet to go through. c) If pkey index points to FULL_MGMT_P_KEY and LIM_MGMT_P_KEY is NOT in the table, drop the packet as LIM_MGMT_P_KEY should always be in the pkey table. It could be a misconfiguration. d) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is NOT in the table, it is safe for the packet to go through since a non-management node is talking to another non-managment node. e) If pkey index points to LIM_MGMT_P_KEY and FULL_MGMT_P_KEY is in the table, drop the packet because a non-management node is talking to a management node, and it could be an attack. For the implementation, these rules can be simplied to only checking for (a) and (e). There's no need to check for rule (b) as the packet doesn't need to be dropped. Rule (c) is not possible in the driver as LIM_MGMT_P_KEY is always in the pkey table. Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | Ib/hfi1: Return actual operational VLs in port info queryPatel Jay P2017-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as operational vls. hfi1_get_ib_cfg() returns vls_operational field in hfi1_pportdata. The problem with this is that the value is always equal to vls_supported field in hfi1_pportdata. The logic to calculate operational_vls is to set value passed by FM (in __subn_set_opa_portinfo routine). If no value is passed then default value is stored in operational_vls. Field actual_vls_operational is calculated on the basis of buffer control table. Hence, modifying hfi1_get_ib_cfg() to return actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Patel Jay P <jay.p.patel@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/hfi1: Race condition between user notification and driver stateMichael J. Ruhl2017-10-302-10/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The handler for link init state (HLS_UP_INIT) notifies userspace (update_statusp()) before enabling the device (RCV_CTRL_RCV_PORT_ENABLE_SMASK) or setting the device state (ppd->host_link_state). This causes a race condition where the userspace thinks the interface is in the INIT state before the driver has set that state. Rework the code path to eliminate the race. Delay setting the init state until after a HW settling period. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | bnxt_re: Implement the shutdown hook of the L2-RoCE driver interfaceSomnath Kotur2017-10-251-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When host is shutting down, it invokes the shutdown hook of the L2 driver where it would attempt to free the MSI-X vectors, but would fail because some vectors are held by the RoCE driver. Implement the new hook in the L2 -> RoCE interface which will be invoked so that the RoCE driver can unregister the device and free up the MSI-X vectors it had claimed so that L2 can proceed with it's shutdown without failure. Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/cxgb4: Declare stag as __be32Leon Romanovsky2017-10-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scqe.stag is actually __b32, fix it. drivers/infiniband/hw/cxgb4/cq.c:754:52: warning: cast to restricted __be32 Cc: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/rxe: Convert timers to use timer_setup()Kees Cook2017-10-254-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Moni Shoua <monis@mellanox.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/cm: Fix memory corruption in handling CM requestParav Pandit2017-10-251-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In recent code, two path record entries are alwasy cleared while allocated could be either one or two path record entries. This leads to zero out of unallocated memory. This fix initializes alternative path record only when alternative path is set. While we are at it, path record allocation doesn't check for OPA alternative path, but rest of the code checks for OPA alternative path. Path record allocation code doesn't check for OPA alternative LID. This can further lead to memory corruption when only one path record is allocated, but there is actually alternative OPA path record present in CM request. Cc: <stable@vger.kernel.org> # v4.12+ Fixes: 9fdca4da4d8c ("IB/SA: Split struct sa_path_rec based on IB and ROCE specific fields") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Add support for RSS on the inner packetMaor Gottlieb2017-10-251-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some user space application would like to do RSS on the inner packet fields instead on the outer. When MLX5_RX_HASH_INNER is set with one or more of the other hash fields, then the RSS will be done using the inner packet. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Add tunneling offloads supportMaor Gottlieb2017-10-253-5/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The device can support receive Stateless Offloads for the inner packet's fields only when the packet is processed by TIR which is enabled to support tunneling. Otherwise, the device treats the packet as an ordinary non-tunneling packet and receive offloads can be done only for the outer packet's field. In order to enable receive Stateless Offloading support for incoming tunneling traffic the TIR should be created with tunneled_offload_en. Tunneling offloads is supported only be raw ethernet QP. This patch includes: * New QP creation flag for tunneling offloads. * Reports device capabilities. Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Support padded 128B CQE featureGuy Levi2017-10-253-4/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some benchmarks and some CPU architectures, writing the CQE on a full cache line size improves performance by saving memory access operations (read-modify-write) relative to partial cache line change. This patch lets the user to configure the device to pad the CQE up to 128B in case its content is less than 128B. Currently the driver supports only padding for a CQE size of 128B. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Support 128B CQE compression featureGuy Levi2017-10-252-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 1cbe6fc86ccf ("IB/mlx5: Add support for CQE compressing") the concept of CQE compression was introduced and added a support for 64B CQE size. This change update the code to support 128B CQE size as well. Signed-off-by: Guy Levi <guyle@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Allow creation of a multi-packet RQNoa Osherovich2017-10-252-8/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow creation of a multi-packet receive queue. In order to create a multi-packet RQ, the following fields in the mlx5_ib_rwq should be set: - log_num_strides: Log of number of strides per WQE - single_stride_log_num_of_bytes: Log of a single stride size - two_byte_shift_en: When enabled, hardware pads 2 bytes of zeros before writing the message to memory (e.g. for the IP alignment). Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | IB/mlx5: Expose multi-packet RQ capabilitiesNoa Osherovich2017-10-252-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reports the device's striding RQ capabilities to the user-space: - min/max_single_stride_log_num_of_bytes: Log of min/max number of bytes in a single stride. - min/max_single_wqe_log_num_of_strides: Log of min/max number of strides in a single WQE. - supported_qpts: A bit mask to know which QP types support multi- packet RQ, for now only Raw Packet QPs. Signed-off-by: Noa Osherovich <noaos@mellanox.com> Reviewed-by: Majd Dibbiny <majd@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/hns: Add modify CQ support for hip08oulijun2017-10-255-0/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is needed to call modify cq API for modifying cq context fields for controlling event generation moderations. This patch mainly adds it. Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/hns: Update the PD&CQE&MTT specification in hip08Wei Hu(Xavier)2017-10-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates the PD specification to 16M for hip08. And it updates the numbers of mtt and cqe segments for the buddy. As the CQE supports hop num 1 addressing, the CQE specification is 64k. This patch updates to set the CQE specification to 64k. Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | | | RDMA/hns: Update the IRRL table chunk size in hip08Wei Hu(Xavier)2017-10-256-16/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the increase of the IRRL specification in hip08, the IRRL table chunk size needs to be updated. This patch updates the IRRL table chunk size to 256k for hip08. Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com> Signed-off-by: Shaobo Xu <xushaobo2@huawei.com> Signed-off-by: Lijun Ou <oulijun@huawei.com> Signed-off-by: Doug Ledford <dledford@redhat.com>