summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx5e: Add XSK zero-copy supportMaxim Mikityanskiy2019-06-271-23/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds support for AF_XDP zero-copy RX and TX. We create a dedicated XSK RQ inside the channel, it means that two RQs are running simultaneously: one for non-XSK traffic and the other for XSK traffic. The regular and XSK RQs use a single ID namespace split into two halves: the lower half is regular RQs, and the upper half is XSK RQs. When any zero-copy AF_XDP socket is active, changing the number of channels is not allowed, because it would break to mapping between XSK RQ IDs and channels. XSK requires different page allocation and release routines. Such functions as mlx5e_{alloc,free}_rx_mpwqe and mlx5e_{get,put}_rx_frag are generic enough to be used for both regular and XSK RQs, and they use the mlx5e_page_{alloc,release} wrappers around the real allocation functions. Function pointers are not used to avoid losing the performance with retpolines. Wherever it's certain that the regular (non-XSK) page release function should be used, it's called directly. Only the stats that could be meaningful for XSK are exposed to the userspace. Those that don't take part in the XSK flow are not considered. Note that we don't wait for WQEs on the XSK RQ (unlike the regular RQ), because the newer xdpsock sample doesn't provide any Fill Ring entries at the setup stage. We create a dedicated XSK SQ in the channel. This separation has its advantages: 1. When the UMEM is closed, the XSK SQ can also be closed and stop receiving completions. If an existing SQ was used for XSK, it would continue receiving completions for the packets of the closed socket. If a new UMEM was opened at that point, it would start getting completions that don't belong to it. 2. Calculating statistics separately. When the userspace kicks the TX, the driver triggers a hardware interrupt by posting a NOP to a dedicated XSK ICO (internal control operations) SQ, in order to trigger NAPI on the right CPU core. This XSK ICO SQ is protected by a spinlock, as the userspace application may kick the TX from any core. Store the pointers to the UMEMs in the net device private context, independently from the kernel. This way the driver can distinguish between the zero-copy and non-zero-copy UMEMs. The kernel function xdp_get_umem_from_qid does not care about this difference, but the driver is only interested in zero-copy UMEMs, particularly, on the cleanup it determines whether to close the XSK RQ and SQ or not by looking at the presence of the UMEM. Use state_lock to protect the access to this area of UMEM pointers. LRO isn't compatible with XDP, but there may be active UMEMs while XDP is off. If this is the case, don't allow LRO to ensure XDP can be reenabled at any time. The validation of XSK parameters typically happens when XSK queues open. However, when the interface is down or the XDP program isn't set, it's still possible to have active AF_XDP sockets and even to open new, but the XSK queues will be closed. To cover these cases, perform the validation also in these flows: 1. A new UMEM is registered, but the XSK queues aren't going to be created due to missing XDP program or interface being down. 2. MTU changes while there are UMEMs registered. Having this early check prevents mlx5e_open_channels from failing at a later stage, where recovery is impossible and the application has no chance to handle the error, because it got the successful return value for an MTU change or XSK open operation. The performance testing was performed on a machine with the following configuration: - 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz - Mellanox ConnectX-5 Ex with 100 Gbit/s link The results with retpoline disabled, single stream: txonly: 33.3 Mpps (21.5 Mpps with queue and app pinned to the same CPU) rxdrop: 12.2 Mpps l2fwd: 9.4 Mpps The results with retpoline enabled, single stream: txonly: 21.3 Mpps (14.1 Mpps with queue and app pinned to the same CPU) rxdrop: 9.9 Mpps l2fwd: 6.8 Mpps Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Consider XSK in XDP MTU limit calculationMaxim Mikityanskiy2019-06-271-2/+3
| | | | | | | | | | | Use the existing mlx5e_get_linear_rq_headroom function to calculate the headroom for mlx5e_xdp_max_mtu. This function takes the XSK headroom into consideration, which will be used in the following patches. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: XDP_TX from UMEM supportMaxim Mikityanskiy2019-06-271-8/+42
| | | | | | | | | | | | | | When an XDP program returns XDP_TX, and the RQ is XSK-enabled, it requires careful handling, because convert_to_xdp_frame creates a new page and copies the data there, while our driver expects the xdp_frame to point to the same memory as the xdp_buff. Handle this case separately: map the page, and in the end unmap it and call xdp_return_frame. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Share the XDP SQ for XDP_TX between RQsMaxim Mikityanskiy2019-06-271-10/+10
| | | | | | | | | | | | | | | Put the XDP SQ that is used for XDP_TX into the channel. It used to be a part of the RQ, but with introduction of AF_XDP there will be one more RQ that could share the same XDP SQ. This patch is a preparation for that change. Separate XDP_TX statistics per RQ were implemented in one of the previous patches. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Refactor struct mlx5e_xdp_infoMaxim Mikityanskiy2019-06-271-30/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, struct mlx5e_xdp_info has some issues that have to be cleaned up before the upcoming AF_XDP support makes things too complicated and messy. This structure is used both when sending the packet and on completion. Moreover, the cleanup procedure on completion depends on the origin of the packet (XDP_REDIRECT, XDP_TX). Adding AF_XDP support will add new flows that use this structure even differently. To avoid overcomplicating the code, this commit refactors the usage of this structure in the following ways: 1. struct mlx5e_xdp_info is split into two different structures. One is struct mlx5e_xdp_xmit_data, a transient structure that doesn't need to be stored and is only used while sending the packet. The other is still struct mlx5e_xdp_info that is stored in a FIFO and contains the fields needed on completion. 2. The fields of struct mlx5e_xdp_info that are used in different flows are put into a union. A special enum indicates the cleanup mode and helps choose the right union member. This approach is clear and explicit. Although it could be possible to "guess" the mode by looking at the values of the fields and at the XDP SQ type, it wouldn't be that clear and extendable and would require looking through the whole chain to understand what's going on. For the reference, there are the fields of struct mlx5e_xdp_info that are used in different flows (including AF_XDP ones): Packet origin | Fields used on completion | Cleanup steps -----------------------+---------------------------+------------------ XDP_REDIRECT, | xdpf, dma_addr | DMA unmap and XDP_TX from XSK RQ | | xdp_return_frame. -----------------------+---------------------------+------------------ XDP_TX from regular RQ | di | Recycle page. -----------------------+---------------------------+------------------ AF_XDP TX | (none) | Increment the | | producer index in | | Completion Ring. On send, the same set of mlx5e_xdp_xmit_data fields is used in all flows: DMA and virtual addresses and length. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Replace deprecated PCI_DMA_TODEVICEMaxim Mikityanskiy2019-06-271-1/+1
| | | | | | | | | | The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Put the common XDP code into a functionMaxim Mikityanskiy2019-05-011-36/+27Star
| | | | | | | | | The same code that returns XDP frames and releases pages is used both in mlx5e_poll_xdpsq_cq and mlx5e_free_xdpsq_descs. Create a function that cleans up an MPWQE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-04-261-2/+22
|\ | | | | | | | | | | Two easy cases of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Fix the max MTU check in case of XDPMaxim Mikityanskiy2019-04-191-0/+20
| | | | | | | | | | | | | | | | | | | | | | MLX5E_XDP_MAX_MTU was calculated incorrectly. It didn't account for NET_IP_ALIGN and MLX5E_HW2SW_MTU, and it also misused MLX5_SKB_FRAG_SZ. This commit fixes the calculations and adds a brief explanation for the formula used. Fixes: a26a5bdf3ee2d ("net/mlx5e: Restrict the combination of large MTU and XDP") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Fix use-after-free after xdp_return_frameMaxim Mikityanskiy2019-04-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | xdp_return_frame releases the frame. It leads to releasing the page, so it's not allowed to access xdpi.xdpf->len after that, because xdpi.xdpf is at xdp->data_hard_start after convert_to_xdp_frame. This patch moves the memory access to precede the return of the frame. Fixes: 58b99ee3e3ebe ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flowShay Agroskin2019-04-231-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon high packet rate with multiple CPUs TX workloads, much of the HCA's resources are spent on prefetching TX descriptors, thus affecting transmission rates. This patch comes to mitigate this problem by moving some workload to the CPU and reducing the HW data prefetch overhead for small packets (<= 256B). When forwarding packets with XDP, a packet that is smaller than a certain size (set to ~256 bytes) would be sent inline within its WQE TX descrptor (mem-copied), when the hardware tx queue is congested beyond a pre-defined water-mark. This is added to better utilize the HW resources (which now makes one less packet data prefetch) and allow better scalability, on the account of CPU usage (which now 'memcpy's the packet into the WQE). To load balance between HW and CPU and get max packet rate, we use watermarks to detect how much the HW is congested and move the work loads back and forth between HW and CPU. Performance: Tested packet rate for UDP 64Byte multi-stream over two dual port ConnectX-5 100Gbps NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz * Tested with hyper-threading disabled XDP_TX: | | before | after | | | 24 rings | 51Mpps | 116Mpps | +126% | | 1 ring | 12Mpps | 12Mpps | same | XDP_REDIRECT: ** Below is the transmit rate, not the redirection rate which might be larger, and is not affected by this patch. | | before | after | | | 32 rings | 64Mpps | 92Mpps | +43% | | 1 ring | 6.4Mpps | 6.4Mpps | same | As we can see, feature significantly improves scaling, without hurting single ring performance. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Add TX MPWQE session counterShay Agroskin2019-04-231-0/+2
| | | | | | | | | | | | | | | | This counter tracks how many TX MPWQE sessions are started in XDP SQ in XDP TX/REDIRECT flow. It counts per-channel and global stats. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | net/mlx5e: XDP, Enhance RQ indication for XDP redirect flushTariq Toukan2019-04-231-3/+3
|/ | | | | | | | | | | The XDP redirect flush indication belongs to the receive queue, not to its XDP send queue. For this, use a new bit on rq->flags. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reviewed-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, fix redirect resources availability checkSaeed Mahameed2019-02-141-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | Currently mlx5 driver creates xdp redirect hw queues unconditionally on netdevice open, This is great until someone starts redirecting XDP traffic via ndo_xdp_xmit on mlx5 device and changes the device configuration at the same time, this might cause crashes, since the other device's napi is not aware of the mlx5 state change (resources un-availability). To fix this we must synchronize with other devices napi's on the system. Added a new flag under mlx5e_priv to determine XDP TX resources are available, set/clear it up when necessary and use synchronize_rcu() when the flag is turned off, so other napi's are in-sync with it, before we actually cleanup the hw resources. The flag is tested prior to committing to transmit on mlx5e_xdp_xmit, and it is sufficient to determine if it safe to transmit or not. The other two internal flags (MLX5E_STATE_OPENED and MLX5E_SQ_STATE_ENABLED) become unnecessary. Thus, they are removed from data path. Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Reported-by: Toke Høiland-Jørgensen <toke@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Support Enhanced Multi-Packet TX WQETariq Toukan2018-12-211-4/+108
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for the HW feature of multi-packet WQE in XDP xmit flow. The conventional TX descriptor (WQE, Work Queue Element) serves a single packet. Our HW has support for multi-packet WQE (MPWQE) in which a single descriptor serves multiple TX packets. This reduces both the PCI overhead and the CPU cycles wasted on writing them. In this patch we add support for the HW feature, which is supported starting from ConnectX-5. Performance: Tested packet rate for UDP 64Byte multi-stream over ConnectX-5 NICs. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz XDP_TX: We see a huge gain on single port ConnectX-5, and reach the 100 Mpps milestone. * Single-port HCA: Before: 70 Mpps After: 100 Mpps (+42.8%) * Dual-port HCA: Before: 51.7 Mpps After: 57.3 Mpps (+10.8%) * In both cases we tested traffic on one port and for now On Dual-port HCAs we see only small gain, we are working to overcome this bottleneck, but for the moment only with experimental firmware on dual port HCAs we can reach the wanted numbers as seen on Single-port HCAs. XDP_REDIRECT: Redirect from (A) ConnectX-5 to (B) ConnectX-5. Due to a setup limitation, (A) and (B) are on different NUMA nodes, so absolute performance numbers are not optimal. Note: Below is the transmit rate of (B), not the redirect rate of (A) which is in some cases higher. * (B) is single-port: Before: 77 Mpps After: 90 Mpps (+16.8%) * (B) is dual-port: Before: 61 Mpps After: 72 Mpps (+18%) Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Add array for WQE info descriptorsTariq Toukan2018-12-211-21/+37
| | | | | | | | | | Each xdp_wqe_info instance describes the number of data-segments and WQEBBs of the WQE. This is useful for a downstream patch that adds support for Multi-Packet TX WQE feature. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Maintain a FIFO structure for xdp_info instancesTariq Toukan2018-12-211-16/+16
| | | | | | | | This provides infrastructure to have multiple xdp_info instances for the same consumer index. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Replace boolean doorbell indication with segment pointerTariq Toukan2018-12-211-10/+4Star
| | | | | | | | | Instead of calculating the control segment to be used upon an XDP xmit doorbell, save it in SQ structure. Nullify when no pending doorbell. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Warn upon polling an error CQETariq Toukan2018-12-211-0/+5
| | | | | | | | Do not ignore the CQE opcode. This helps expose issues and debug them. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Change the XDP SQ redirect indicationTariq Toukan2018-12-211-10/+4Star
| | | | | | | | | | | | | Do not maintain an SQ state bit to indicate whether an XDP SQ serves redirect operations. Instead, rely on the fact that such an XDP SQ doesn't reside in an RQ instance, while the others do. This info is not known to the XDP SQ functions themselves, and they rely on their callers to distinguish between the cases. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: XDP, Precede XDP-related operations in RQ poll by a loaded ↵Tariq Toukan2018-12-211-0/+15
| | | | | | | | | | | | | | program check At the end of the RQ polling loop, some XDP-related operations might be required. Before checking them one by one, check if an XDP program is even loaded. Combine all the checks and operations in a single function in xdp files. This saves unnecessary checks for non-XDP flows. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Mark expected switch fall-throughsGustavo A. R. Silva2018-08-081-0/+2
| | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. Addresses-Coverity-ID: 114808 ("Missing break in switch") Addresses-Coverity-ID: 114802 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx5: handle DMA mapping error case for XDP redirectJesper Dangaard Brouer2018-07-311-0/+3
| | | | | | | | | | | | Commit 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") forgot to return/free the xdp_frame in case the DMA mapping failed, correct this. Also DMA unmap the frame in case mlx5e_xmit_xdp_frame() fails. Fixes: 58b99ee3e3eb ("net/mlx5e: Add support for XDP_REDIRECT in device-out side") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx5e: Add support for XDP_REDIRECT in device-out sideTariq Toukan2018-07-271-14/+78
| | | | | | | | | | | | | | | | | | | | | Add implementation for the ndo_xdp_xmit callback. Dedicate a new set of XDP-SQ instances to satisfy the XDP_REDIRECT requests. These instances are totally separated from the existing XDP-SQ objects that satisfy local XDP_TX actions. Performance tests: xdp_redirect_map from ConnectX-5 to ConnectX-5. CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz Packet-rate of 64B packets. Single queue: 7 Mpps. Multi queue: 55 Mpps. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Re-order fields of struct mlx5e_xdpsqTariq Toukan2018-07-271-4/+4
| | | | | | | | | | | | | | | | | In the downstream patch that adds support to XDP_REDIRECT-out, the XDP xmit frame function doesn't share the same run context as the NAPI that polls the XDP-SQ completion queue. Hence, need to re-order the XDP-SQ fields to avoid cacheline false-sharing. Take redirect_flush and doorbell out of DB, into separated cachelines. Add a cacheline breaker within the stats struct. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Refactor XDP countersTariq Toukan2018-07-271-7/+5Star
| | | | | | | | | | | | Separate the XDP counters into two sets: (1) One set reside in the RQ stats, and they monitor XDP stats in the RQ side. (2) Another set is per XDP-SQ, and they monitor XDP stats that are related to XDP transmit flow. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Make XDP xmit functions more genericTariq Toukan2018-07-271-28/+42
| | | | | | | | | | | Convert the XDP xmit functions to use the generic xdp_frame API in XDP_TX flow. Same functions will be used later in this series to transmit the XDP redirect-out packets as well. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Add counter for XDP redirect in RXTariq Toukan2018-07-271-0/+1
| | | | | | | | Add per-ring and total stats for received packets that goes into XDP redirection. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Move XDP related code into new XDP filesTariq Toukan2018-07-271-0/+225
Take XDP code out of the general EN header and RX file into new XDP files. Currently, XDP-SQ resides only within an RQ and used from a single flow (XDP_TX) triggered upon RX completions. In a downstream patch, additional type of XDP-SQ instances will be presented and used for the XDP_REDIRECT flow, totally unrelated to the RX context. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>