summaryrefslogtreecommitdiffstats
path: root/drivers/net/ethernet/mellanox/mlx5/core/en_rep.h
Commit message (Collapse)AuthorAgeFilesLines
* net/mlx5e: Register devlink ports for physical link, PCI PF, VFsParav Pandit2019-07-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Register devlink port of physical port, PCI PF and PCI VF flavour for each PF, VF when a given devlink instance is in switchdev mode. Implement ndo_get_devlink_port callback API to make use of registered devlink ports. This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id() callbacks. Hence, remove them. An example output with 2 VFs, without a PF and single uplink port is below. $devlink port show pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0 pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1 Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx5e: Rearrange tc tunnel code in a modular wayYevgeny Kliteynik2019-05-311-2/+1Star
| | | | | | | | | | | | | | Rearrange tc tunnel code so that it would be easy to add future tunnels: - Define tc tunnel object with the fields and callbacks that any tunnel must implement. - Define tc UDP tunnel object for UDP tunnels, such as VXLAN - Move each tunnel code (GRE, VXLAN) to its own separate file - Rewrite tc tunnel implementation in a general way - using only the objects and their callbacks. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Geneve, Keep tunnel info as pointer to the original structYevgeny Kliteynik2019-05-311-1/+1
| | | | | | | | | | | | | | | | | In mlx5e encap entry structure, IP tunnel info data structure is copied by value. This approach worked till now, but it breaks when there are encapsulation options, such as in case of Geneve. These options are stored in the structure that is allocated adjacent to the IP tunnel info struct, and not pointed at by any field in that struct. Therefore, when copying the struct by value, we loose the address of the original struct and can't get to the encapsulation options. Fix the problem by storing the pointer to the tunnel info data instead. Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* {IB,net}/mlx5: Constify rep ops functions pointersParav Pandit2019-05-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Currently for every representor type and for every single vport, representer function pointers copy is stored even though they don't change from one to other vport. Additionally priv data entry for the rep is not passed during registration, but its copied. It is used (set and cleared) by the user of the reps. As we want to scale vports, to simplify and also to split constants from data, 1. Rename mlx5_eswitch_rep_if to mlx5_eswitch_rep_ops as to match _ops prefix with other standard netdev, ibdev ops. 2. Constify the IB and Ethernet rep ops structure. 3. Instead of storing copy of all rep function pointers, store copy per eswitch rep type. 4. Split data and function pointers to mlx5_eswitch_rep_ops and mlx5_eswitch_rep_data. Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Mark Bloch <markb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* {IB, net}/mlx5: No need to typecast from void* to mlx5_ib_dev*Parav Pandit2019-05-311-1/+1
| | | | | | | | Avoid typecasting from void* to mlx5_ib_dev* or mlx5e_rep_priv* as it is not needed. Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Re-attempt to offload flows on multipath port affinity eventsRoi Dayan2019-03-011-0/+3
| | | | | | | | | | | | | | | | Under multipath it's possible for us to offload the flow only through the e-switch for which proper route through the uplink exists. When the port is up and the next-hop route is set again we want to offload through it as well. We generate SW event from the FIB event handler when multipath port affinity changes. The tc offloads code gets this event, goes over the flows which were marked as of having missing route and attempts to offload them. Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Fix GRE key by controlling port tunnel entropy calculationEli Britstein2019-02-221-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Flow entropy is calculated on the inner packet headers and used for flow distribution in processing, routing etc. For GRE-type encapsulations the entropy value is placed in the eight LSB of the key field in the GRE header as defined in NVGRE RFC 7637. For UDP based encapsulations the entropy value is placed in the source port of the UDP header. The hardware may support entropy calculation specifically for GRE and for all tunneling protocols. With commit df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading") GRE is offloaded, but the hardware is configured by default to calculate flow entropy so packets transmitted on the wire have a wrong key. To support UDP based tunnels (i.e VXLAN), GRE (i.e. no flow entropy) and NVGRE (i.e. with flow entropy) the hardware behaviour must be controlled by the driver. Ensure port entropy calculation is enabled for offloaded VXLAN tunnels and disable port entropy calculation in the presence of offloaded GRE tunnels by monitoring the presence of entropy enabling tunnels (i.e VXLAN) and entropy disabing tunnels (i.e GRE). Fixes: df2ef3bff193 ("net/mlx5e: Add GRE protocol offloading") Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Update hw flows when encap source mac changedTonghao Zhang2019-02-071-0/+1
| | | | | | | | | | | | | | | | | | | | When we offload tc filters to hardware, hardware flows can be updated when mac of encap destination ip is changed. But we ignore one case, that the mac of local encap ip can be changed too, so we should also update them. To fix it, add route_dev in mlx5e_encap_entry struct to save the local encap netdevice, and when mac changed, kernel will flush all the neighbour on the netdevice and send NETEVENT_NEIGH_UPDATE event. The mlx5 driver will delete the flows and add them when neighbour available again. Fixes: 232c001398ae ("net/mlx5e: Add support to neighbour update flow") Cc: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx5e: Fail attempt to offload e-switch TC flows with egress upper devicesEli Britstein2018-12-201-0/+3
| | | | | | | | | | | | | | | | | | We use the switchdev parent HW id helper to identify if the mirred device shares the same ASIC/port with the ingress device. This can get us wrong in the presence of upper devices such as vlan or bridge set over the HW devices (VF or uplink representors), b/c the switchdev ID is retrieved recursively. To fail offload attempts in such cases, we condition the check on the egress device to have not only the same switchdev ID but also the relevant mlx5 netdev ops. Fixes: 03a9d11e6eeb ('net/mlx5e: Add TC drop and mirred/redirect action parsing for SRIOV offloads') Signed-off-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Uninstantiate esw manager vport netdev on switchdev modeOr Gerlitz2018-12-171-4/+2Star
| | | | | | | | | | | | | | | | | | Now, when we have a dedicated uplink representor, the netdev instance set over the esw manager vport (PF) is of no-use. As such, remove it once we're on switchdev mode and get it back to life when off switchdev. This is done by reloading the Ethernet interface as well (we already do that for the IB interface) from the eswitch code while going in/out of switchdev mode. The Eth add/remove entries are modified to act differently when called in switchdev mode. In this case we only deal with registration of the eth vport representors. The rep netdevices are created from the eswitch call to load the registered eth representors. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Remove leftover code from the PF netdev being uplink repOr Gerlitz2018-12-171-3/+0Star
| | | | | | | | Remove some last leftovers from using the PF netdev as the e-switch uplink representor. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Use dedicated uplink vport netdev representorOr Gerlitz2018-12-171-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | Currently, when running in sriov switchdev mode, we are using the PF netdevice as the uplink representor, this is problematic from few aspects: - will break when the PF isn't eswitch manager (e.g smart NIC env) - misalignment with other NIC switchdev drivers - makes us have and maintain special code, hurts the driver quality/robustness - which in turn opens the door for future bugs As of each and all of the above, we move to have a dedicated netdev representor for the uplink vport in a similar manner done for for the VF vports. This includes the following: 1. have an uplink rep netdev as we have for VF reps 2. all reps use same load/unload functions 3. HW stats for uplink based on physical port counters and not vport counters 4. link state for the uplink managed through PAOS and not vport state 5. the uplink rep has sysfs link to the PF PCI function && uses the PF MAC address Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Branch according to classified tunnel typeOz Shlomo2018-12-111-0/+2
| | | | | | | | | | | | | | | | | | Currently the tunnel offloading encap/decap methods assumes that VXLAN is the sole tunneling protocol. Lay the infrastructure for supporting multiple tunneling protocols by branching according to the tunnel net device kind. Encap filters tunnel type is determined according to the egress/mirred net device. Decap filters classify the tunnel type according to the filter's ingress net device kind. Distinguish between the tunnel type as defined by the SW model and the FW reformat type that specifies the HW operation being made. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Support TC indirect block notifications for eswitch uplink reprsOz Shlomo2018-12-111-0/+13
| | | | | | | | | | | Towards using this mechanism as the means to offload tunnel decap rules set on SW tunnel devices instead of egdev, add the supporting structures and functions. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Store eswitch uplink representor state on a dedicated structOz Shlomo2018-12-111-1/+8
| | | | | | | | | | | | | | | | Currently only a single field in the representor private structure is relevant for uplink representors. As a pre-step to allow adding additional uplink representor fields, introduce uplink representor private structure. This is prepration step towards replacing egdev logic with the indirect block notification mechanism. This patch doesn't change any functionality. Signed-off-by: Oz Shlomo <ozsh@mellanox.com> Reviewed-by: Eli Britstein <elibr@mellanox.com> Acked-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Use shared table for offloaded TC eswitch flowsOr Gerlitz2018-05-181-0/+1
| | | | | | | | | | | | | | | | | | | | Currently, each representor netdev use their own hash table to keep the mapping from TC flow (f->cookie) to the driver offloaded instance. The table is the one which originally was added for offloading TC NIC (not eswitch) rules. This scheme breaks when the core TC code calls us to add the same flow twice, (e.g under egdev use case) since we don't spot that and offload a 2nd flow into the HW with the wrong source vport. As a pre-step to solve that, we move to use a single table which keeps all offloaded TC eswitch flows. The table is located at the eswitch uplink representor object. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: E-Switch, Move send-to-vport rule struct to en_repMark Bloch2017-12-281-0/+5
| | | | | | | | | Move struct mlx5_esw_sq which keeps send-to-vport rule to from the eswitch code to mlx5e and rename it to better reflect where it belongs Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5: E-Switch, Create generic header struct to be used by representorsMark Bloch2017-12-281-1/+1
| | | | | | | | | | | | | | | | Now that we don't store type dependent data in struct mlx5_eswitch_rep we can create a generic interface, and representor type. struct mlx5_eswitch_rep will store an array of interfaces, each interface is used by a different representor type. Once we moved to a more generic interface, rdma driver representors can be added and utilize the same mechanism as the Ethernet driver representors use. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Move ethernet representors data into separate structMark Bloch2017-12-281-0/+9
| | | | | | | | | | | | | | | | | Ethernet representors have a need to store data which is applicable only for them. Create a priv void pointer in struct mlx5_eswitch_rep and move mlx5e to store the relevant data there. As part of this change we also initialize rep_if in mlx5e_rep_register_vf_vports() as otherwise the E-Switch code will copy a priv value which is garbage. We also rename mlx5_eswitch_get_uplink_netdev() to mlx5_eswitch_get_uplink_priv() and make it return void *. This way E-Switch code doesn't need to deal with net devices and we leave the task of getting it to mlx5e. Signed-off-by: Mark Bloch <markb@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5: Add CONFIG_MLX5_ESWITCH KconfigSaeed Mahameed2017-08-071-0/+8
| | | | | | | | | | | | | | | Allow to selectively build the driver with or without sriov eswitch, VF representors and TC offloads. Also remove the need of two ndo ops structures (sriov & basic) and keep only one unified ndo ops, compile out VF SRIOV ndos when not needed (MLX5_ESWITCH=n), and for VF netdev calling those ndos will result in returning -EPERM. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Jes Sorensen <jsorensen@fb.com> Cc: kernel-team@fb.com
* net/mlx5e: NIC netdev init flow cleanupSaeed Mahameed2017-08-071-0/+1
| | | | | | | | | | | | | Remove redundant call to unregister vport representor in mlx5e_add error flow. Hide the representor priv and eswitch internal structures from en_main.c as preparation step for downstream patches which would allow building the driver without support for representors and eswitch. Fixes: 6f08a22c5fb2 ("net/mlx5e: Register/unregister vport representors on interface attach/detach") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
* net/mlx5e: Update neighbour 'used' state using HW flow rules countersHadar Hen Zion2017-04-301-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | When IP tunnel encapsulation rules are offloaded, the kernel can't see the traffic of the offloaded flow. The neighbour for the IP tunnel destination of the offloaded flow can mistakenly become STALE and deleted by the kernel since its 'used' value wasn't changed. To make sure that a neighbour which is used by the HW won't become STALE, we proactively update the neighbour 'used' value every DELAY_PROBE_TIME period, when packets were matched and counted by the HW for one of the tunnel encap flows related to this neighbour. The periodic task that updates the used neighbours is scheduled when a tunnel encap rule is successfully offloaded into HW and keeps re-scheduling itself as long as the representor's neighbours list isn't empty. Add, remove, lookup and status change operations done over the representor's neighbours list or the neighbour hash entry encaps list are all serialized by RTNL lock. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Add support to neighbour update flowHadar Hen Zion2017-04-301-1/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to offload TC encap rules, the driver does a lookup for the IP tunnel neighbour according to the output device and the destination IP given by the user. To keep tracking after the validity state of such neighbours, we keep the neighbours information (pair of device pointer and destination IP) in a hash table maintained at the relevant egress representor and register to get NETEVENT_NEIGH_UPDATE events. When getting neighbour update netevent, we search for a match among the cached neighbours entries used for encapsulation. In case the neighbour isn't valid, we can't offload the flow into the HW. We cache the flow (requested matching and actions) in the driver and offload the rule later, when the neighbour is resolved and becomes valid. When a flow is only cached in the driver and not offloaded into HW yet, we use EAGAIN return value to mark it internally, the TC ndo still returns success. Listen to kernel neighbour update netevents to trace relevant neighbours validity state: 1. If a neighbour becomes valid, offload the related rules to HW. 2. If the neighbour becomes invalid, remove the related rules from HW. 3. If the neighbour mac address was changed, update the encap header. Remove all the offloaded rules using the old encap header from the HW and insert new rules to HW with updated encap header. Access to the neighbors hash table is protected by RTNL lock of its caller or by the table's spinlock. Details of the locking/synchronization among the different actions applied on the neighbour table: Add/remove operations - protected by RTNL lock of its caller (all TC commands are protected by RTNL lock). Add and remove operations are initiated only when the user inserts/removes a TC rule into/from the driver. Lookup/remove operations - since the lookup operation is done from netevent notifier block, RTNL lock can't be used (atomic context). Use the table's spin lock to protect lookups from TC user removal operation. bh is used since netevent can be called from a softirq context. Lookup/add operations - The hash table access functions are taking care of the protection between lookup and add operations. When adding/removing encap headers and rules to/from the HW, RTNL lock is used. It can happen when: 1. The user inserts/removes a TC rule into/from the driver (TC commands are protected by RTNL lock of it's caller). 2. The driver gets neighbour notification event, which reports about neighbour validity status change. Before adding/removing encap headers and rules to/from the HW, RTNL lock is taken. A neighbour hash table entry should be freed when its encap list is empty. Since The neighbour update netevent notification schedules a neighbour update work that uses the neighbour hash entry, it can't be freed unconditionally when the encap list becomes empty during TC delete rule flow. Use reference count to protect from freeing neighbour hash table entry while it's still in use. When the user asks to unregister a netdvice used by one of the neigbours, neighbour removal notification is received. Then we take a reference on the neighbour and don't free it until the relevant encap entries (and flows) are marked as invalid (not offloaded) and removed from HW. As long as the encap entry is still valid (checked under RTNL lock) we can safely access the neighbour device saved on mlx5e_neigh struct. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Add neighbour hash table to the representorsHadar Hen Zion2017-04-301-0/+30
| | | | | | | | | | | | | | | | | | | Add hash table to the representors which is to be used by the next patch to save neighbours information in the driver. In order to offload IP tunnel encapsulation rules, the driver must find the tunnel dst neighbour according to the output device and the destination address given by the user. The next patch will cache the neighbors information in the driver to allow support in neigh update flow for tunnel encap rules. The neighbour entries are also saved in a list so we easily iterate over them when querying statistics in order to provide 'used' feedback to the kernel neighbour NUD core. Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Move the encap entry structure from the eswitch headerOr Gerlitz2017-04-301-0/+13
| | | | | | | | | | | | The encap entry structure isn't manipulated by the eswitch code, hence it can/needs to be removed from the eswitch header. Do that, and change it to have mlx5e_ prefix. This patch doesn't change any functionality. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* net/mlx5e: Extendable vport representor netdev private dataSaeed Mahameed2017-04-301-0/+55
Make representor netdev private data extendable by adding new struct "mlx5e_rep_priv" and use it as the rep netdev private data struct instead of directly pointing to mlx5_eswitch_rep. Added new en_rep.h header file to contain all representor related definitions and prototypes, and moved all representor specific logic into en_rep.c. Needed for downstream patches to extend representor functionality to support neighbour update. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>