summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/en
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx5e: Fix a race with XSKICOSQ in XSK wakeup flowMaxim Mikityanskiy2019-08-151-0/+3
| | | | | | | | | | | Add a missing spinlock around XSKICOSQ usage at the activation stage, because there is a race between a configuration change and the application calling sendto(). Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Remove redundant check in CQE recovery flow of tx reporterAya Levin2019-08-081-3/+0Star
| | | | | | | | | | Remove check of recovery bit, in the beginning of the CQE recovery function. This test is already performed right before the reporter is invoked, when CQE error is detected. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Fix error flow of CQE recovery on tx reporterAya Levin2019-08-081-4/+8
| | | | | | | | | | | | CQE recovery function begins with test and set of recovery bit. Add an error flow which ensures clearing of this bit when leaving the recovery function, to allow further recoveries to take place. This allows removal of clearing recovery bit on sq activate. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Fix false negative indication on tx reporter CQE recoveryAya Levin2019-08-081-4/+2Star
| | | | | | | | | | | | | | Remove wrong error return value when SQ is not in error state. CQE recovery on TX reporter queries the sq state. If the sq is not in error state, the sq is either in ready or reset state. Ready state is good state which doesn't require recovery and reset state is a temporal state which ends in ready state. With this patch, CQE recovery in this scenario is successful. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Fix matching of speed to PRM link modesAya Levin2019-07-252-11/+22
| | | | | | | | | | | | | | | | | | | Speed translation is performed based on legacy or extended PTYS register. Translate speed with respect to: 1) Capability bit of extended PTYS table. 2) User request: a) When auto-negotiation is turned on, inspect advertisement whether it contains extended link modes. b) When auto-negotiation is turned off, speed > 100Gbps (maximal speed supported in legacy mode). With both conditions fulfilled translation is done with extended PTYS table otherwise use legacy PTYS table. Without this patch 25/50/100 Gbps speed cannot be set, since try to configure in extended mode but read from legacy mode. Fixes: dd1b9e09c12b ("net/mlx5: ethtool, Allow legacy link-modes configuration via non-extended ptys") Signed-off-by: Aya Levin <ayal@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Fix wrong max num channels indicationTariq Toukan2019-07-251-2/+3
| | | | | | | | | | | No XSK support in the enhanced IPoIB driver and representors. Add a profile property to specify this, and enhance the logic that calculates the max number of channels to take it into account. Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* Merge tag 'mlx5-fixes-2019-07-11' of ↵David S. Miller2019-07-121-6/+4Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2019-07-11 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.15 ('net/mlx5e: IPoIB, Add error path in mlx5_rdma_setup_rn') For -stable v5.1 ('net/mlx5e: Fix port tunnel GRE entropy control') ('net/mlx5e: Rx, Fix checksum calculation for new hardware') ('net/mlx5e: Fix return value from timeout recover function') ('net/mlx5e: Fix error flow in tx reporter diagnose') For -stable v5.2 ('net/mlx5: E-Switch, Fix default encap mode') Conflict note: This pull request will produce a small conflict when merged with net-next. In drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c Take the hunk from net and replace: esw_offloads_steering_init(esw, vf_nvports, total_nvports); with: esw_offloads_steering_init(esw); ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix error flow in tx reporter diagnoseAya Levin2019-07-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Fix tx reporter's diagnose callback. Propagate error when failing to gather diagnostics information or failing to print diagnostic data per queue. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Fix return value from timeout recover functionAya Levin2019-07-111-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | Fix timeout recover function to return a meaningful return value. When an interrupt was not sent by the FW, return IO error instead of 'true'. Fixes: c7981bea48fb ("net/mlx5e: Fix return status of TX reporter timeout recover") Signed-off-by: Aya Levin <ayal@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net: flow_offload: rename tc_cls_flower_offload to flow_cls_offloadPablo Neira Ayuso2019-07-095-23/+23
| | | | | | | | | | | | | | | | | | | | | | | | And any other existing fields in this structure that refer to tc. Specifically: * tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule(). * TC_CLSFLOWER_* to FLOW_CLS_*. * tc_cls_common_offload to tc_cls_common_offload. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Add kTLS TX HW offload supportTariq Toukan2019-07-061-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for transmit side kernel-TLS acceleration. Offload the crypto encryption to HW. Per TLS connection: - Use a separate TIS to maintain the HW context. - Use a separate encryption key. - Maintain static and progress HW contexts by posting the proper WQEs at creation time, or upon resync. - Use a special DUMP opcode to replay the previous frags and sync the HW context. To make sure the SQ is able to serve an xmit request, increase SQ stop room to cover: - static params WQE, - progress params WQE, and - resync DUMP per frag. Currently supporting TLS 1.2, and key size 128bit. Tested over SimX simulator. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Introduce a fenced NOP WQE posting functionTariq Toukan2019-07-061-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the existing mlx5e_post_nop(), but marks a fence in the WQE control segment. Added as a separate new function to not hurt the performance of the common case. To be used in a downstream patch of the series. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Tx, Unconstify SQ stop roomTariq Toukan2019-07-061-0/+14
| | | | | | | | | | | | | | | | | | Use an SQ field for stop_room, and use the larger value only if TLS is supported. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Tx, Make SQ WQE fetch function type genericTariq Toukan2019-07-061-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change mlx5e_sq_fetch_wqe to be agnostic to the Work Queue Element (WQE) type. Before this patch, it was specific for struct mlx5e_tx_wqe. In order to allow the change, the function now returns the generic void pointer, and gets the WQE size to do the zero memset. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Tx, Enforce L4 inline copy when neededTariq Toukan2019-07-061-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | When ctrl->tisn field exists, this indicates an operation (HW offload) on the TCP payload. For such WQEs, inline the headers up to L4. This is in preparation for kTLS HW offload support, added in a downstream patch. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5e: Move helper functions to a new txrx datapath headerTariq Toukan2019-07-062-0/+164
| | | | | | | | | | | | | | | | | | Take datapath helper functions to a new header file en/txrx.h. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller2019-07-0413-114/+1271
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf-next 2019-07-03 The following pull-request contains BPF updates for your *net-next* tree. There is a minor merge conflict in mlx5 due to 8960b38932be ("linux/dim: Rename externally used net_dim members") which has been pulled into your tree in the meantime, but resolution seems not that bad ... getting current bpf-next out now before there's coming more on mlx5. ;) I'm Cc'ing Saeed just so he's aware of the resolution below: ** First conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD static int mlx5e_open_cq(struct mlx5e_channel *c, struct dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) ======= int mlx5e_open_cq(struct mlx5e_channel *c, struct net_dim_cq_moder moder, struct mlx5e_cq_param *param, struct mlx5e_cq *cq) >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Resolution is to take the second chunk and rename net_dim_cq_moder into dim_cq_moder. Also the signature for mlx5e_open_cq() in ... drivers/net/ethernet/mellanox/mlx5/core/en.h +977 ... and in mlx5e_open_xsk() ... drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c +64 ... needs the same rename from net_dim_cq_moder into dim_cq_moder. ** Second conflict in drivers/net/ethernet/mellanox/mlx5/core/en_main.c: <<<<<<< HEAD int cpu = cpumask_first(mlx5_comp_irq_get_affinity_mask(priv->mdev, ix)); struct dim_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; struct mlx5e_channel *c; unsigned int irq; ======= struct net_dim_cq_moder icocq_moder = {0, 0}; >>>>>>> e5a3e259ef239f443951d401db10db7d426c9497 Take the second chunk and rename net_dim_cq_moder into dim_cq_moder as well. Let me know if you run into any issues. Anyway, the main changes are: 1) Long-awaited AF_XDP support for mlx5e driver, from Maxim. 2) Addition of two new per-cgroup BPF hooks for getsockopt and setsockopt along with a new sockopt program type which allows more fine-grained pass/reject settings for containers. Also add a sock_ops callback that can be selectively enabled on a per-socket basis and is executed for every RTT to help tracking TCP statistics, both features from Stanislav. 3) Follow-up fix from loops in precision tracking which was not propagating precision marks and as a result verifier assumed that some branches were not taken and therefore wrongly removed as dead code, from Alexei. 4) Fix BPF cgroup release synchronization race which could lead to a double-free if a leaf's cgroup_bpf object is released and a new BPF program is attached to the one of ancestor cgroups in parallel, from Roman. 5) Support for bulking XDP_TX on veth devices which improves performance in some cases by around 9%, from Toshiaki. 6) Allow for lookups into BPF devmap and improve feedback when calling into bpf_redirect_map() as lookup is now performed right away in the helper itself, from Toke. 7) Add support for fq's Earliest Departure Time to the Host Bandwidth Manager (HBM) sample BPF program, from Lawrence. 8) Various cleanups and minor fixes all over the place from many others. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/mlx5e: Add XSK zero-copy supportMaxim Mikityanskiy2019-06-2713-52/+1091
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for AF_XDP zero-copy RX and TX. We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels. XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly. Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered. Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage. We create a dedicated XSK SQ in the channel. This separation has its advantages: 1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it. 2. Calculating statistics separately. When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core. Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers. LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time. The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows: 1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down. 2. MTU changes while there are UMEMs registered. Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled, single stream: txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps The results with retpoline enabled, single stream: txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Move queue param structs to en/params.hMaxim Mikityanskiy2019-06-271-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | structs mlx5e_{rq,sq,cq,channel}_param are going to be used in the upcoming XSK RX and TX patches. Move them to a header file to make them accessible from other C files. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Consider XSK in XDP MTU limit calculationMaxim Mikityanskiy2019-06-274-5/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the existing mlx5e_get_linear_rq_headroom function to calculate the headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom into consideration, which will be used in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: XDP_TX from UMEM supportMaxim Mikityanskiy2019-06-271-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an XDP program returns XDP_TX, and the RQ is XSK-enabled, it requires careful handling, because convert_to_xdp_frame creates a new page and copies the data there, while our driver expects the xdp_frame to point to the same memory as the xdp_buff. Handle this case separately: map the page, and in the end unmap it and call xdp_return_frame. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Share the XDP SQ for XDP_TX between RQsMaxim Mikityanskiy2019-06-272-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Put the XDP SQ that is used for XDP_TX into the channel. It used to be a part of the RQ, but with introduction of AF_XDP there will be one more RQ that could share the same XDP SQ. This patch is a preparation for that change. Separate XDP_TX statistics per RQ were implemented in one of the previous patches. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Refactor struct mlx5e_xdp_infoMaxim Mikityanskiy2019-06-272-36/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, struct mlx5e_xdp_info has some issues that have to be cleaned up before the upcoming AF_XDP support makes things too complicated and messy. This structure is used both when sending the packet and on completion. Moreover, the cleanup procedure on completion depends on the origin of the packet (XDP_REDIRECT, XDP_TX). Adding AF_XDP support will add new flows that use this structure even differently. To avoid overcomplicating the code, this commit refactors the usage of this structure in the following ways: 1. struct mlx5e_xdp_info is split into two different structures. One is struct mlx5e_xdp_xmit_data, a transient structure that doesn't need to be stored and is only used while sending the packet. The other is still struct mlx5e_xdp_info that is stored in a FIFO and contains the fields needed on completion. 2. The fields of struct mlx5e_xdp_info that are used in different flows are put into a union. A special enum indicates the cleanup mode and helps choose the right union member. This approach is clear and explicit. Although it could be possible to "guess" the mode by looking at the values of the fields and at the XDP SQ type, it wouldn't be that clear and extendable and would require looking through the whole chain to understand what's going on. For the reference, there are the fields of struct mlx5e_xdp_info that are used in different flows (including AF_XDP ones): Packet origin | Fields used on completion | Cleanup steps -----------------------+---------------------------+------------------ XDP_REDIRECT, | xdpf, dma_addr | DMA unmap and XDP_TX from XSK RQ | | xdp_return_frame. -----------------------+---------------------------+------------------ XDP_TX from regular RQ | di | Recycle page. -----------------------+---------------------------+------------------ AF_XDP TX | (none) | Increment the | | producer index in | | Completion Ring. On send, the same set of mlx5e_xdp_xmit_data fields is used in all flows: DMA and virtual addresses and length. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Calculate linear RX frag size considering XSKMaxim Mikityanskiy2019-06-272-22/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Additional conditions introduced: - XSK implies XDP. - Headroom includes the XSK headroom if it exists. - No space is reserved for struct shared_skb_info in XSK mode. - Fragment size smaller than the XSK chunk size is not allowed. A new auxiliary function mlx5e_get_linear_rq_headroom with the support for XSK is introduced. Use this function in the implementation of mlx5e_get_rq_headroom. Change headroom to u32 to match the headroom field in struct xdp_umem. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | net/mlx5e: Replace deprecated PCI_DMA_TODEVICEMaxim Mikityanskiy2019-06-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | net/mlx5e: Disallow tc redirect offload cases we don't supportPaul Blakey2019-06-291-1/+3
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | After changing the parent_id to be the same for both NICs of same the hardware device, netdev_port_same_parent_id now returns true for more cases (all the lower devices in the hierarchy are on the same hardware device). If merged eswitch isn't enabled, these cases aren't supported, so disallow them. Signed-off-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-181-5/+6
|\| | | | | | | | | | | | | Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Support tagged tunnel over bondEli Britstein2019-06-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | Stacked devices like bond interface may have a VLAN device on top of them. Detect lag state correctly under this condition, and return the correct routed net device, according to it the encap header is built. Fixes: e32ee6c78efa ("net/mlx5e: Support tunnel encap over tagged Ethernet") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Geneve, Add support for encap/decap flows offloadYevgeny Kliteynik2019-05-313-0/+340
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add HW offloading support for flows with Geneve encap/decap. Notes about decap flows with Geneve TLV Options: - Support offloading of 32-bit options data only - At any given time, only one combination of class/type parameters can be offloaded, but the same class/type combination can have many different flows offloaded with different 32-bit option data - Options with value of 0 can't be offloaded Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Rearrange tc tunnel code in a modular wayYevgeny Kliteynik2019-05-314-200/+366
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rearrange tc tunnel code so that it would be easy to add future tunnels: - Define tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way - using only the objects and their callbacks. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Geneve, Keep tunnel info as pointer to the original structYevgeny Kliteynik2019-05-311-7/+8
|/ | | | | | | | | | | | | | | | | In mlx5e encap entry structure, IP tunnel info data structure is copied by value. This approach worked till now, but it breaks when there are encapsulation options, such as in case of Geneve. These options are stored in the structure that is allocated adjacent to the IP tunnel info struct, and not pointed at by any field in that struct. Therefore, when copying the struct by value, we loose the address of the original struct and can't get to the encapsulation options. Fix the problem by storing the pointer to the tunnel info data instead. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* treewide: Add SPDX license identifier - Makefile/KconfigThomas Gleixner2019-05-211-0/+1
| | | | | | | | | | | | | | Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0-only Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* net/mlx5e: Put the common XDP code into a functionMaxim Mikityanskiy2019-05-011-36/+27Star
| | | | | | | | | The same code that returns XDP frames and releases pages is used both in mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that cleans up an MPWQE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-04-262-4/+23
|\ | | | | | | | | | | Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix the max MTU check in case of XDPMaxim Mikityanskiy2019-04-192-2/+21
| | | | | | | | | | | | | | | | | | | | | | MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ. This commit fixes the calculations and adds a brief explanation for the formula used. Fixes: a26a5bdf3ee2d ("net/mlx5e: Restrict the combination of large MTU and XDP") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Fix use-after-free after xdp_return_frameMaxim Mikityanskiy2019-04-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | xdp_return_frame releases the frame. It leads to releasing the page, so it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves the memory access to precede the return of the frame. Fixes: 58b99ee3e3ebe ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Remove unused parameterMaxim Mikityanskiy2019-04-232-6/+4Star
| | | | | | | | | | | | | | | | mdev is unused in mlx5e_rx_is_linear_skb. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Add an underflow warning commentMaxim Mikityanskiy2019-04-231-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on the requested number of frames in the RQ (F) and the number of packets per WQE (P). It ensures that N is not less than the minimum number of WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min should be true. This function deals with logarithms, so it should check that log(F) - log(P) >= log(N_min). However, if F < P, this expression will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min) instead. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: Move parameter calculation functions to en/params.cMaxim Mikityanskiy2019-04-232-0/+125
| | | | | | | | | | | | | | | | | | This commit moves the parameter calculation functions to a separate file for better modularity and code sharing with future features. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flowShay Agroskin2019-04-232-12/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE). To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Add TX MPWQE session counterShay Agroskin2019-04-231-0/+2
| | | | | | | | | | | | | | | | This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Enhance RQ indication for XDP redirect flushTariq Toukan2019-04-231-3/+3
| | | | | | | | | | | | | | | | | | | | | | The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue. For this, use a new bit on rq->flags. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-04-172-3/+12
|\| | | | | | | | | | | Conflict resolution of af_smc.c from Stephen Rothwell. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Protect against non-uplink representor for encapDmytro Linkin2019-04-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | TC encap offload is supported only for the physical uplink representor. Fail for non uplink representor. Fixes: 3e621b19b0bb ("net/mlx5e: Support TC encapsulation offloads with upper devices") Signed-off-by: Dmytro Linkin <dmitrolin@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Use fail-safe channels reopen in tx reporter recoverEran Ben Elisha2019-04-091-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | When requested to recover from error, the tx reporter might open new channels and close the existing ones. Use safe channels switch flow in order to guarantee opened channels at the end of the recover flow. For this purpose, define mlx5e_safe_reopen_channels function and use it within those flows. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Skip un-needed tx recover if interface state is downEran Ben Elisha2019-04-091-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | Skip recover operation if interface is in down state as TX objects are not open. This fixes a bug were the recover flow re-opened TX objects which were not opened before, leading to a possible memory leak at driver unload. Fixes: de8650a82071 ("net/mlx5e: Add tx reporter support") Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | ipv4: Prepare rtable for IPv6 gatewayDavid Ahern2019-04-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | To allow the gateway to be either an IPv4 or IPv6 address, remove rt_uses_gateway from rtable and replace with rt_gw_family. If rt_gw_family is set it implies rt_uses_gateway. Rename rt_gateway to rt_gw4 to represent the IPv4 version. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge tag 'mlx5-updates-2019-04-02' of ↵David S. Miller2019-04-081-3/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mamameed says: ==================== mlx5-updates-2019-04-02 This series provides misc updates to mlx5 driver 1) Aya Levin (1): Handle event of power detection in the PCIE slot 2) Eli Britstein (6): Some TC VLAN related updates and fixes to the previous VLAN modify action support patchset. Offload TC e-switch rules with egress/ingress VLAN devices 3) Max Gurtovoy (1): Fix double mutex initialization in esiwtch.c 4) Tariq Toukan (3): Misc small updates A write memory barrier is sufficient in EQ ci update Obsolete param field holding a constant value Unify logic of MTU boundaries 5) Tonghao Zhang (4): Misc updates to en_tc.c Make the log friendly when decapsulation offload not supported Remove 'parse_attr' argument in parse_tc_fdb_actions() Deletes unnecessary setting of esw_attr->parse_attr Return -EOPNOTSUPP when attempting to offload an unsupported action ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/mlx5e: Make the log friendly when decapsulation offload not supportedTonghao Zhang2019-04-051-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we try to offload decapsulation actions to VFs hw, we get the log [1]. It's not friendly, because the kind of net device is null, and we don't know what '0' means. [1] "mlx5_core 0000:05:01.2 vf_0: decapsulation offload is not supported for net device (0)" Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-04-052-20/+22
|\ \ \ | |/ / |/| / | |/ | | | | | | | | | | Minor comment merge conflict in mlx5. Staging driver has a fixup due to the skb->xmit_more changes in 'net-next', but was removed in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>