summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx5e: Refactor struct mlx5e_xdp_infoMaxim Mikityanskiy2019-06-273-41/+97
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, struct mlx5e_xdp_info has some issues that have to be cleaned up before the upcoming AF_XDP support makes things too complicated and messy. This structure is used both when sending the packet and on completion. Moreover, the cleanup procedure on completion depends on the origin of the packet (XDP_REDIRECT, XDP_TX). Adding AF_XDP support will add new flows that use this structure even differently. To avoid overcomplicating the code, this commit refactors the usage of this structure in the following ways: 1. struct mlx5e_xdp_info is split into two different structures. One is struct mlx5e_xdp_xmit_data, a transient structure that doesn't need to be stored and is only used while sending the packet. The other is still struct mlx5e_xdp_info that is stored in a FIFO and contains the fields needed on completion. 2. The fields of struct mlx5e_xdp_info that are used in different flows are put into a union. A special enum indicates the cleanup mode and helps choose the right union member. This approach is clear and explicit. Although it could be possible to "guess" the mode by looking at the values of the fields and at the XDP SQ type, it wouldn't be that clear and extendable and would require looking through the whole chain to understand what's going on. For the reference, there are the fields of struct mlx5e_xdp_info that are used in different flows (including AF_XDP ones): Packet origin | Fields used on completion | Cleanup steps -----------------------+---------------------------+------------------ XDP_REDIRECT, | xdpf, dma_addr | DMA unmap and XDP_TX from XSK RQ | | xdp_return_frame. -----------------------+---------------------------+------------------ XDP_TX from regular RQ | di | Recycle page. -----------------------+---------------------------+------------------ AF_XDP TX | (none) | Increment the | | producer index in | | Completion Ring. On send, the same set of mlx5e_xdp_xmit_data fields is used in all flows: DMA and virtual addresses and length. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Allow ICO SQ to be used by multiple RQsMaxim Mikityanskiy2019-06-274-23/+22Star
| | | | | | | | | | | | | Prepare to creation of the XSK RQ, which will require posting UMRs, too. The same ICO SQ will be used for both RQs and also to trigger interrupts by posting NOPs. UMR WQEs can't be reused any more. Optimization introduced in commit ab966d7e4ff98 ("net/mlx5e: RX, Recycle buffer of UMR WQEs") is reverted. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Calculate linear RX frag size considering XSKMaxim Mikityanskiy2019-06-273-23/+52
| | | | | | | | | | | | | | | | | | | Additional conditions introduced: - XSK implies XDP. - Headroom includes the XSK headroom if it exists. - No space is reserved for struct shared_skb_info in XSK mode. - Fragment size smaller than the XSK chunk size is not allowed. A new auxiliary function mlx5e_get_linear_rq_headroom with the support for XSK is introduced. Use this function in the implementation of mlx5e_get_rq_headroom. Change headroom to u32 to match the headroom field in struct xdp_umem. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Replace deprecated PCI_DMA_TODEVICEMaxim Mikityanskiy2019-06-271-1/+1
| | | | | | | | | | The PCI API for DMA is deprecated, and PCI_DMA_TODEVICE is just defined to DMA_TO_DEVICE for backward compatibility. Just use DMA_TO_DEVICE. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* net/mlx5e: Attach/detach XDP program safelyMaxim Mikityanskiy2019-06-271-11/+20
| | | | | | | | | | | | | | | | | | | | When an XDP program is set, a full reopen of all channels happens in two cases: 1. When there was no program set, and a new one is being set. 2. When there was a program set, but it's being unset. The full reopen is necessary, because the channel parameters may change if XDP is enabled or disabled. However, it's performed in an unsafe way: if the new channels fail to open, the old ones are already closed, and the interface goes down. Use the safe way to switch channels instead. The same way is already used for other configuration changes. Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* xdp: tracking page_pool resources and safe removalJesper Dangaard Brouer2019-06-191-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is needed before we can allow drivers to use page_pool for DMA-mappings. Today with page_pool and XDP return API, it is possible to remove the page_pool object (from rhashtable), while there are still in-flight packet-pages. This is safely handled via RCU and failed lookups in __xdp_return() fallback to call put_page(), when page_pool object is gone. In-case page is still DMA mapped, this will result in page note getting correctly DMA unmapped. To solve this, the page_pool is extended with tracking in-flight pages. And XDP disconnect system queries page_pool and waits, via workqueue, for all in-flight pages to be returned. To avoid killing performance when tracking in-flight pages, the implement use two (unsigned) counters, that in placed on different cache-lines, and can be used to deduct in-flight packets. This is done by mapping the unsigned "sequence" counters onto signed Two's complement arithmetic operations. This is e.g. used by kernel's time_after macros, described in kernel commit 1ba3aab3033b and 5a581b367b5, and also explained in RFC1982. The trick is these two incrementing counters only need to be read and compared, when checking if it's safe to free the page_pool structure. Which will only happen when driver have disconnected RX/alloc side. Thus, on a non-fast-path. It is chosen that page_pool tracking is also enabled for the non-DMA use-case, as this can be used for statistics later. After this patch, using page_pool requires more strict resource "release", e.g. via page_pool_release_page() that was introduced in this patchset, and previous patches implement/fix this more strict requirement. Drivers no-longer call page_pool_destroy(). Drivers already call xdp_rxq_info_unreg() which call xdp_rxq_info_unreg_mem_model(), which will attempt to disconnect the mem id, and if attempt fails schedule the disconnect for later via delayed workqueue. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlx5: more strict use of page_pool APIJesper Dangaard Brouer2019-06-192-5/+7
| | | | | | | | | | | | | | | | | | | | | | The mlx5 driver is using page_pool, but not for DMA-mapping (currently), and is a little too relaxed about returning or releasing page resources, as it is not strictly necessary, when not using DMA-mappings. As this patchset is working towards tracking page_pool resources, to know about in-flight frames on shutdown. Then fix places where mlx5 leak page_pool resource. In case of dma_mapping_error, then recycle into page_pool. In mlx5e_free_rq() moved the page_pool_destroy() call to after the mlx5e_page_release() calls, as it is more correct. In mlx5e_page_release() when no recycle was requested, then release page from the page_pool, via page_pool_release_page(). Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* page_pool: introduce page_pool_free and use in mlx5Jesper Dangaard Brouer2019-06-191-3/+3
| | | | | | | | | | | | | | In case driver fails to register the page_pool with XDP return API (via xdp_rxq_info_reg_mem_model()), then the driver can free the page_pool resources more directly than calling page_pool_destroy(), which does a unnecessarily RCU free procedure. This patch is preparing for removing page_pool_destroy(), from driver invocation. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_flower: Implement support for ingress device matchingJiri Pirko2019-06-193-9/+59
| | | | | | | | | Benefit from the previously extended flow_dissector infrastructure and offload matching on ingress port. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_acl: Fix SRC_SYS_PORT element sizeJiri Pirko2019-06-192-5/+5
| | | | | | | | Fix the size of the SRC_SYS_PORT element to be 16. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_acl: Avoid size check for RX_ACL_SYSTEM_PORT elementJiri Pirko2019-06-193-8/+13
| | | | | | | | | | RX_ACL_SYSTEM_PORT is 8 bit but SRC_SYS_PORT is 16 bits. Internally, SRC_SYS_PORT is used to carry the value. Relax the checker in case of RX_ACL_SYSTEM_PORT and allow different size. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_acl: Write RX_ACL_SYSTEM_PORT acl element correctlyJiri Pirko2019-06-193-10/+21
| | | | | | | | | | RX_ACL_SYSTEM_PORT is equal to SRC_SYS_PORT - 1. So before write to block we need to adjust the key value. Introduce new "EXT" helper to implement this. Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx5: add missing void argument to function mlx5_devlink_allocColin Ian King2019-06-191-1/+1
| | | | | | | | | Function mlx5_devlink_alloc is missing a void argument, add it to clean up the non-ANSI function declaration. Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Stop sending in-kernel notifications for each nexthopIdo Schimmel2019-06-181-2/+0Star
| | | | | | | | | | | | | | | | Both listeners - mlxsw and netdevsim - of IPv6 FIB notifications are now ready to handle IPv6 multipath notifications. Therefore, stop ignoring such notifications in both drivers and stop sending notification for each added / deleted nexthop. v2: * Remove 'multipath_rt' from 'struct fib6_entry_notifier_info' Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Create IPv6 multipath routes in one goIdo Schimmel2019-06-181-13/+23
| | | | | | | | | Allow the driver to create an IPv6 multipath route in one go by passing an array of sibling routes and iterating over them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Add / delete multiple IPv6 nexthopsIdo Schimmel2019-06-181-22/+39
| | | | | | | | | | | | Currently, the functions that take care of populating IPv6 nexthop groups only add / delete a single nexthop. Prepare them to handle multiple routes in one notification by passing an array of routes and adding / deleting all of them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Pass array of routes to route handling functionsIdo Schimmel2019-06-181-4/+10
| | | | | | | | | | Prepare the driver to handle multiple routes in a single notification by passing an array of routes to the functions that actually add / delete a route. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Adjust IPv6 replace logic to new notificationsIdo Schimmel2019-06-181-7/+7
| | | | | | | | | | | | | | | | | | Previously, IPv6 replace notifications were only sent from fib6_add_rt2node(). The function only emitted such notifications if a route actually replaced another route. A previous patch added another call site in ip6_route_multipath_add() from which such notification can be emitted even if a route was merely added and did not replace another route. Adjust the driver to take this into account and potentially set the 'replace' flag to 'false' if the notified route did not replace an existing route. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Pass multiple routes to work itemIdo Schimmel2019-06-181-7/+65
| | | | | | | | | | | | | | | Prepare the driver to process IPv6 multipath notifications by passing an array of 'struct fib6_info' instead of just one route. A reference is taken on each sibling route in order to prevent them from being freed until they are processed by the workqueue. v2: * Remove 'multipath_rt' usage Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Prepare function to return errorsIdo Schimmel2019-06-181-3/+11
| | | | | | | | | | | | | | | | The function mlxsw_sp_router_fib6_event() takes care of preparing the needed information for the work item that actually inserts the route into the device. When processing an IPv6 multipath route, the function will need to allocate an array to store pointers to all the sibling routes. Change the function's signature to return an error code and adjust the single call site. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Remove processing of IPv6 append notificationsIdo Schimmel2019-06-181-2/+0Star
| | | | | | | | No such notifications are sent by the IPv6 code, so remove them. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_router: Ignore IPv6 multipath notificationsIdo Schimmel2019-06-181-0/+2
| | | | | | | | | | | | IPv6 multipath notifications are about to be sent, but mlxsw is not ready to process them, so ignore them. The limitation will be lifted by a subsequent patch which will also stop the kernel from sending a notification for each nexthop. Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* mlxsw: spectrum_ptp: Fix compilation on 32-bit ARMShalom Toledo2019-06-181-3/+2Star
| | | | | | | | | | | | | | | | | | | Compilation on 32-bit ARM fails after commit 992aa864dca0 ("mlxsw: spectrum_ptp: Add implementation for physical hardware clock operations") because of 64-bit division: arm-linux-gnueabi-ld: drivers/net/ethernet/mellanox/mlxsw/spectrum_ptp.o: in function `mlxsw_sp1_ptp_phc_settime': spectrum_ptp.c:(.text+0x39c): undefined reference to `__aeabi_uldivmod' Fix by using div_u64(). Fixes: 992aa864dca0 ("mlxsw: spectrum_ptp: Add implementation for physical hardware clock operations") Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2019-06-1813-31/+137
|\ | | | | | | | | | | | | Honestly all the conflicts were simple overlapping changes, nothing really interesting to report. Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum: Disallow prio-tagged packets when PVID is removedIdo Schimmel2019-06-121-1/+1
| | | | | | | | | | | | | | | | | | | | When PVID is removed from a bridge port, the Linux bridge drops both untagged and prio-tagged packets. Align mlxsw with this behavior. Fixes: 148f472da5db ("mlxsw: reg: Add the Switch Port Acceptable Frame Types register") Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum_buffers: Reduce pool size on Spectrum-2Petr Machata2019-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to an issue on Spectrum-2, in front-panel ports split four ways, 2 out of 32 port buffers cannot be used. To work around this, the next FW release will mark them as unused, and will report correspondingly lower total shared buffer size. mlxsw will pick up the new value through a query to cap_total_buffer_size resource. However the initial size for shared buffer pool 0 is hard-coded and therefore needs to be updated. Thus reduce the pool size by 2.7 MiB (which corresponds to 2/32 of the total size of 42 MiB), and round down to the whole number of cells. Fixes: fe099bf682ab ("mlxsw: spectrum_buffers: Add Spectrum-2 shared buffer configuration") Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum_flower: Fix TOS matchingJiri Pirko2019-06-121-2/+2
| | | | | | | | | | | | | | | | | | | | The TOS value was not extracted correctly. Fix it. Fixes: 87996f91f739 ("mlxsw: spectrum_flower: Add support for ip tos") Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum_router: Refresh nexthop neighbour when it becomes deadIdo Schimmel2019-06-121-3/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver tries to periodically refresh neighbours that are used to reach nexthops. This is done by periodically calling neigh_event_send(). However, if the neighbour becomes dead, there is nothing we can do to return it to a connected state and the above function call is basically a NOP. This results in the nexthop never being written to the device's adjacency table and therefore never used to forward packets. Fix this by dropping our reference from the dead neighbour and associating the nexthop with a new neigbhour which we will try to refresh. Fixes: a7ff87acd995 ("mlxsw: spectrum_router: Implement next-hop routing") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum: Use different seeds for ECMP and LAG hashIdo Schimmel2019-06-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The same hash function and seed are used for both ECMP and LAG hash. Therefore, when a LAG device is used as a nexthop device as part of an ECMP group, hash polarization can occur and all the traffic will be hashed to a single LAG slave. Fix this by using a different seed for the LAG hash. Fixes: fa73989f2697 ("mlxsw: spectrum: Use a stable ECMP/LAG seed") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Alex Veber <alexve@mellanox.com> Tested-by: Alex Veber <alexve@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net/mlx5e: Support tagged tunnel over bondEli Britstein2019-06-071-5/+6
| | | | | | | | | | | | | | | | | | | | | | Stacked devices like bond interface may have a VLAN device on top of them. Detect lag state correctly under this condition, and return the correct routed net device, according to it the encap header is built. Fixes: e32ee6c78efa ("net/mlx5e: Support tunnel encap over tagged Ethernet") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Avoid detaching non-existing netdev under switchdev modeAlaa Hleihel2019-06-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After introducing dedicated uplink representor, the netdev instance set over the esw manager vport (PF) became no longer in use, so it was removed in the cited commit once we're on switchdev mode. However, the mlx5e_detach function was not updated accordingly, and it still tries to detach a non-existing netdev, causing a kernel crash. This patch fixes this issue. Fixes: aec002f6f82c ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Fix source port matching in fdb peer flow ruleRaed Salem2019-06-071-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cited commit changed the initialization placement of the eswitch attributes so it is done prior to parse tc actions function call, including among others the in_rep and in_mdev fields which are mistakenly reassigned inside the parse actions function. This breaks the source port matching criteria of the peer redirect rule. Fix by removing the now redundant reassignment of the already initialized fields. Fixes: 988ab9c7363a ("net/mlx5e: Introduce mlx5e_flow_esw_attr_init() helper") Signed-off-by: Raed Salem <raeds@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Replace reciprocal_scale in TX select queue functionShay Agroskin2019-06-073-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The TX queue index returned by the fallback function ranges between [0,NUM CHANNELS - 1] if QoS isn't set and [0, (NUM CHANNELS)*(NUM TCs) -1] otherwise. Our HW uses different TC mapping than the fallback function (which is denoted as 'up', user priority) so we only need to extract a channel number out of the returned value. Since (NUM CHANNELS)*(NUM TCs) is a relatively small number, using reciprocal scale almost always returns zero. We instead access the 'txq2sq' table to extract the sq (and with it the channel number) associated with the tx queue, thus getting a more evenly distributed channel number. Perf: Rx/Tx side with Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz and ConnectX-5. Used 'iperf' UDP traffic, 10 threads, and priority 5. Before: 0.566Mpps After: 2.37Mpps As expected, releasing the existing bottleneck of steering all traffic to TX queue zero significantly improves transmission rates. Fixes: 7ccdd0841b30 ("net/mlx5e: Fix select queue callback") Signed-off-by: Shay Agroskin <shayag@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5e: Add ndo_set_feature for uplink representorChris Mi2019-06-073-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After we have a dedicated uplink representor, the new netdev ops doesn't support ndo_set_feature. Because of that, we can't change some features, eg. rxvlan. Now add it back. In this patch, I also do a cleanup for the features flag handling, eg. remove duplicate NETIF_F_HW_TC flag setting. Fixes: aec002f6f82c ("net/mlx5e: Uninstantiate esw manager vport netdev on switchdev mode") Signed-off-by: Chris Mi <chrism@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Avoid reloading already removed devicesAlaa Hleihel2019-06-071-2/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to reloading a device we must first verify that it was not already removed. Otherwise, the attempt to remove the device will do nothing, and in that case we will end up proceeding with adding an new device that no one was expecting to remove, leaving behind used resources such as EQs that causes a failure to destroy comp EQs and syndrome (0x30f433). Fix that by making sure that we try to remove and add a device (based on a protocol) only if the device is already added. Fixes: c5447c70594b ("net/mlx5: E-Switch, Reload IB interface when switching devlink modes") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * net/mlx5: Update pci error handler entries and command translationEdward Srouji2019-06-071-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | Add missing entries for create/destroy UCTX and UMEM commands. This could get us wrong "unknown FW command" error in flows where we unbind the device or reset the driver. Also the translation of these commands from opcodes to string was missing. Fixes: 6e3722baac04 ("IB/mlx5: Use the correct commands for UMEM and UCTX allocation") Signed-off-by: Edward Srouji <edwards@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | Merge tag 'mlx5-updates-2019-06-13' of ↵David S. Miller2019-06-1513-120/+1224
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2019-06-13 Mlx5 devlink health fw reporters and sw reset support This series provides mlx5 firmware reset support and firmware devlink health reporters. 1) Add initial mlx5 kernel documentation and include devlink health reporters 2) Add CR-Space access and FW Crdump snapshot support via devlink region_snapshot 3) Issue software reset upon FW asserts 4) Add fw and fw_fatal devlink heath reporters to follow fw errors indication by dump and recover procedures and enable trigger these functionality by user. 4.1) fw reporter: The fw reporter implements diagnose and dump callbacks. It follows symptoms of fw error such as fw syndrome by triggering fw core dump and storing it and any other fw trace into the dump buffer. The fw reporter diagnose command can be triggered any time by the user to check current fw status. 4.2) fw_fatal repoter: The fw_fatal reporter implements dump and recover callbacks. It follows fatal errors indications by CR-space dump and recover flow. The CR-space dump uses vsc interface which is valid even if the FW command interface is not functional, which is the case in most FW fatal errors. The CR-space dump is stored as a memory region snapshot to ease read by address. The recover function runs recover flow which reloads the driver and triggers fw reset if needed. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/mlx5: Report devlink health on FW fatal issuesMoshe Shemesh2019-06-132-22/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Report devlink health on FW fatal issues via fw_fatal_reporter. The driver recover flow for FW fatal error is now being handled by the devlink health. Having the recovery controlled by devlink health, the user has the ability to cancel the auto-recovery for debug session and run it manually. Call mlx5_enter_error_state() before calling devlink_health_report() to ensure entering device error state even if auto-recovery is off. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Add support for FW fatal reporter dumpMoshe Shemesh2019-06-131-0/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support of dump callback for mlx5 FW fatal reporter. The FW fatal dump uses cr-dump functionality to gather cr-space data for debug. The cr-dump uses vsc interface which is valid even if the FW command interface is not functional, which is the case in most FW fatal errors. Command example and output: $ devlink health dump show pci/0000:82:00.0 reporter fw_fatal crdump_data: 00 20 00 01 00 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ba 82 00 00 0c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 00 a4 0e 00 00 00 00 00 00 80 c7 fe ff 50 0a 00 00 ... ... Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Add fw fatal devlink_health_reporterMoshe Shemesh2019-06-131-20/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create mlx5_devlink_health_reporter for fw fatal reporter. The fw fatal reporter is added in addition to the fw reporter and implements the recover callback. The point of having two reporters for FW issues, is that we don't want to run FW recover on any issue, but only fatal ones. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Report devlink health on FW issuesMoshe Shemesh2019-06-131-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use devlink_health_report() to report any symptom of FW issue as FW counter miss or new health syndrome. The FW issues detected in mlx5 during poll_health which is called in timer atomic context and so health work queue is used to schedule the reports. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Add support for FW reporter dumpMoshe Shemesh2019-06-133-0/+270
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support of dump callback for mlx5 FW reporter. Once we trigger FW dump, the FW will write the core dump to its raw data buffer. The tracer translates the raw data to traces and save it to a cyclic array. Once dump is done, the saved traces data is filled into the dump buffer. In case syndrome is not zero the health buffer content will be printed as well. FW dump example: $ devlink health dump show pci/0000:82:00.0 reporter fw dump fw traces: timestamp: 509006640427 lost: false event_id: 185 msg: dump general info GVMI=0x0000 timestamp: 509006645474 lost: false event_id: 185 msg: GVMI management info, gvmi_management context: timestamp: 509006654463 lost: false event_id: 185 msg: [000]: 00000000 00000000 00000000 00000000 timestamp: 509006656127 lost: false event_id: 185 msg: [010]: 00000000 00000000 00000000 00000000 timestamp: 509006656255 lost: false event_id: 185 msg: [020]: 00000000 00000000 00000000 00000000 timestamp: 509006656511 lost: false event_id: 185 msg: [030]: 00000000 00000000 00000000 00000000 timestamp: 509006656639 lost: false event_id: 185 msg: [040]: 00000000 00000000 00000000 00000000 timestamp: 509006656895 lost: false event_id: 185 msg: [050]: 00000000 00000000 00000000 00000000 timestamp: 509006657023 lost: false event_id: 185 msg: [060]: 00000000 00000000 00000000 00000000 timestamp: 509006657180 lost: false event_id: 185 msg: [070]: 00000000 00000000 00000000 00000000 timestamp: 509006659839 lost: false event_id: 185 msg: CMDIF dbase from IRON: active_dbase_slots = 0x00000000 timestamp: 509006667391 lost: false event_id: 185 msg: GVMI=0x0000 hw_toc context: timestamp: 509006667647 lost: false event_id: 185 msg: [000]: 00000000 00000000 00000000 fffff000 timestamp: 509006667775 lost: false event_id: 185 msg: [010]: 00000000 00000000 00000000 80d00000 ... ... Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Create FW devlink_health_reporterMoshe Shemesh2019-06-131-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create mlx5_devlink_health_reporter for FW reporter. The FW reporter implements devlink_health_reporter diagnose callback. The fw reporter diagnose command can be triggered any time by the user to check current fw status. In healthy status, it will return clear syndrome. Otherwise it will return the syndrome and description of the error type. Command example and output on healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 0 Command example and output on non healthy status: $ devlink health diagnose pci/0000:82:00.0 reporter fw Syndrome: 8 Description: unrecoverable hardware error Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Issue SW reset on FW assertFeras Daoud2019-06-134-7/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a FW assert is considered fatal, indicated by a new bit in the health buffer, reset the FW. After the reset go through the normal recovery flow. Only one PF needs to issue the reset, so an attempt is made to prevent the 2nd function from also issuing the reset. It's not an error if that happens, it just slows recovery. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Control CR-space access by different PFsFeras Daoud2019-06-133-5/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the FW can be shared between different PFs/VFs it is common that more than one health poll will detected a failure, this can lead to multiple resets which are unneeded. The solution is to use a FW locking mechanism using semaphore space to provide a way to allow only one device to collect the cr-dump and to issue a sw-reset. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Handle SW reset of FW in error flowFeras Daoud2019-06-134-64/+47Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New mlx5 adapters allow the driver to reset the FW in the event of an error, this action called "SW Reset". When an SW reset is issued on any PF all PFs enter reset state which is a recoverable condition. The existing recovery flow was designed to allow the recovery of a VF after a PF driver reload. This patch adds the sw reset to the NIC states as a preparation for sw reset handling. When a software reset is issued the following occurs: 1. The NIC interface mode is set to 7 while the reset is in progress. 2. Once the reset completes the NIC interface mode is set to 1. Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Daniel Jurgens <danielj@mellanox.com> Reviewed-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Add Crdump supportAlex Vesker2019-06-134-1/+115
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Crdump allows the driver to retrieve a dump of the FW PCI crspace. This is useful in case of catastrophic issues which may require FW reset. The crspace dump can be used for later debug. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Add Vendor Specific Capability access gatewayAlex Vesker2019-06-134-1/+315
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Vendor Specific Capability (VSC) is used to activate a gateway interfacing with the device. The gateway is used to read or write device configurations, which are organized in different domains (spaces). A configuration access may result in multiple actions, reads, writes. Example usages are accessing the Crspace domain to read the crspace or locking a device semaphore using the Semaphore domain. The configuration access use pci_cfg_access to prevent parallel access to the VSC space by the driver and userspace calls. Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
| * | net/mlx5: Move all devlink related functions calls to devlink.cEran Ben Elisha2019-06-134-38/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Centralize all devlink related callbacks in one file. In the downstream patch, some more functionality will be added, this patch is preparing the driver infrastructure for it. Currently, move devlink un/register functions calls into this file. Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Reviewed-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* | | net/mlx5e: use indirect calls wrapper for the rx packet handlerPaolo Abeni2019-06-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can avoid another indirect call per packet wrapping the rx handler call with the proper helper. To ensure that even the last listed direct call experience measurable gain, despite the additional conditionals we must traverse before reaching it, I tested reversing the order of the listed options, with performance differences below noise level. Together with the previous indirect call patch, this gives ~6% performance improvement in raw UDP tput. v2 -> v3: - use only the direct calls always available regardless of the mlx5 build options - drop the direct call list macro, to keep the code as simple as possible for future rework v1 -> v2: - update the direct call list and use a macro to define it, as per Saeed suggestion. An intermediated additional macro is needed to allow arg list expansion Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>