summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* net/sched: act_skbedit: don't use spinlock in the data pathDavide Caratti2018-07-122-49/+95
| | | | | | | | | use RCU instead of spin_{,un}lock_bh, to protect concurrent read/write on act_skbedit configuration. This reduces the effects of contention in the data path, in case multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/sched: skbedit: use per-cpu countersDavide Caratti2018-07-121-4/+4
| | | | | | | | use per-CPU counters, instead of sharing a single set of stats with all cores: this removes the need of spinlocks when stats are read/updated. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: use monotonic timestamps for PAWSArnd Bergmann2018-07-126-16/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using get_seconds() for timestamps is deprecated since it can lead to overflows on 32-bit systems. While the interface generally doesn't overflow until year 2106, the specific implementation of the TCP PAWS algorithm breaks in 2038 when the intermediate signed 32-bit timestamps overflow. A related problem is that the local timestamps in CLOCK_REALTIME form lead to unexpected behavior when settimeofday is called to set the system clock backwards or forwards by more than 24 days. While the first problem could be solved by using an overflow-safe method of comparing the timestamps, a nicer solution is to use a monotonic clocksource with ktime_get_seconds() that simply doesn't overflow (at least not until 136 years after boot) and that doesn't change during settimeofday(). To make 32-bit and 64-bit architectures behave the same way here, and also save a few bytes in the tcp_options_received structure, I'm changing the type to a 32-bit integer, which is now safe on all architectures. Finally, the ts_recent_stamp field also (confusingly) gets used to store a jiffies value in tcp_synq_overflow()/tcp_synq_no_recent_overflow(). This is currently safe, but changing the type to 32-bit requires some small changes there to keep it working. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/tls: Use aead_request_alloc/free for request alloc/freeVakul Garg2018-07-121-8/+4Star
| | | | | | | | | | | Instead of kzalloc/free for aead_request allocation and free, use functions aead_request_alloc(), aead_request_free(). It ensures that any sensitive crypto material held in crypto transforms is securely erased from memory. Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tc-testing: add geneve options in tunnel_key unit testsPieter Jansen van Vuuren2018-07-121-0/+168
| | | | | | | | | | Extend tc tunnel_key action unit tests with geneve options. Tests include testing single and multiple geneve options, as well as testing geneve options that are expected to fail. Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Acked-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'be2net-small-structures-clean-up'David S. Miller2018-07-122-32/+13Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ivan Vecera says: ==================== be2net: small structures clean-up The series: - removes unused / unneccessary fields in several be2net structures - re-order fields in some structures to eliminate holes, cache-lines crosses - as result reduces size of main struct be_adapter by 4kB ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: move rss_flags field in rss_info to ensure proper alignmentIvan Vecera2018-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current position of .rss_flags field in struct rss_info causes that fields .rsstable and .rssqueue (both 128 bytes long) crosses cache-line boundaries. Moving it at the end properly align all fields. Before patch: struct rss_info { u64 rss_flags; /* 0 8 */ u8 rsstable[128]; /* 8 128 */ /* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */ u8 rss_queue[128]; /* 136 128 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ u8 rss_hkey[40]; /* 264 40 */ }; After patch: struct rss_info { u8 rsstable[128]; /* 0 128 */ /* --- cacheline 2 boundary (128 bytes) --- */ u8 rss_queue[128]; /* 128 128 */ /* --- cacheline 4 boundary (256 bytes) --- */ u8 rss_hkey[40]; /* 256 40 */ u64 rss_flags; /* 296 8 */ }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: re-order fields in be_error_recovert to avoid holeIvan Vecera2018-07-121-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | - Unionize two u8 fields where only one of them is used depending on NIC chipset. - Move recovery_supported field after that union These changes eliminate 7-bytes hole in the struct and makes it smaller by 8 bytes. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: remove unused tx_jiffies field from be_tx_statsIvan Vecera2018-07-121-1/+0Star
| | | | | | | | | | Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: move txcp field in be_tx_obj to eliminate holes in the structIvan Vecera2018-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before patch: struct be_tx_obj { u32 db_offset; /* 0 4 */ /* XXX 4 bytes hole, try to pack */ struct be_queue_info q; /* 8 56 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct be_queue_info cq; /* 64 56 */ struct be_tx_compl_info txcp; /* 120 4 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 2 boundary (128 bytes) --- */ struct sk_buff * sent_skb_list[2048]; /* 128 16384 */ ... }: After patch: struct be_tx_obj { u32 db_offset; /* 0 4 */ struct be_tx_compl_info txcp; /* 4 4 */ struct be_queue_info q; /* 8 56 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct be_queue_info cq; /* 64 56 */ struct sk_buff * sent_skb_list[2048]; /* 120 16384 */ ... }; Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: reorder fields in be_eq_obj structureIvan Vecera2018-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | Re-order fields in struct be_eq_obj to ensure that .napi field begins at start of cache-line. Also the .adapter field is moved to the first cache-line next to .q field and 3 fields (idx,msi_idx,spurious_intr) and the 4-bytes hole to 3rd cache-line. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: remove desc field from be_eq_objIvan Vecera2018-07-122-3/+4
| | | | | | | | | | | | | | | | | | The event queue description (be_eq_obj.desc) field is used only to format string for IRQ name and it is not really needed to hold this value. Remove it and use local variable to format string for IRQ name. Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: remove unused old custom busy-poll fieldsIvan Vecera2018-07-121-13/+0Star
| | | | | | | | | | | | | | | | | | | | | | The commit fb6113e688e0 ("be2net: get rid of custom busy poll code") replaced custom busy-poll code by the generic one but left several macros and fields in struct be_eq_obj that are currently unused. Remove this stuff. Fixes: fb6113e688e0 ("be2net: get rid of custom busy poll code") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * be2net: remove unused old AIC infoIvan Vecera2018-07-121-7/+0Star
|/ | | | | | | | | | The commit 2632bafd74ae ("be2net: fix adaptive interrupt coalescing") introduced a separate struct be_aic_obj to hold AIC information but unfortunately left the old stuff in be_eq_obj. So remove it. Fixes: 2632bafd74ae ("be2net: fix adaptive interrupt coalescing") Signed-off-by: Ivan Vecera <cera@cera.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: ti: cpts: break cycle once late ts is matchedIvan Khoronzhuk2018-07-121-1/+4
| | | | | | | | The late ts queue can contain a bunch of skbs while hi rate testing, no need to check all of them if timestamp is already matched. Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* selftests: forwarding: mirror_gre_nh: Unset rp_filter on host VRFPetr Machata2018-07-121-0/+4
| | | | | | | | | | | | | The mirrored packets arrive at $h3 encapsulated in GRE/IPv4, with IP address from 192.0.2.128/28 network. However the interface is configured as a member of 192.0.2.160/28 and there's no route directing traffic from the former network through that interface. Correspondingly, the RP filter on the VRF rejects it. Therefore turn off the VRF's RP filter. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'mlxsw-ERSPAN-Take-LACP-state-into-consideration'David S. Miller2018-07-125-6/+52
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== mlxsw: ERSPAN: Take LACP state into consideration Petr says: When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. This patch set aims to address these problems. Patch #1 publishes team_port_get_rcu(). Then in patch #2, a new function is introduced, mlxsw_sp_port_dev_check(). That returns, for a given netdevice that is a slave of a LAG device, whether that device is "txable", i.e. whether the LAG master would send traffic through it. Since there's no good place to put LAG-wide helpers, introduce a new header include/net/lag.h. Finally in patch #3, fix the slave selection logic to take into consideration whether a given slave has a carrier and whether it is txable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum_span: Change LAG lower selectionPetr Machata2018-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When offloading mirror-to-gretap, mlxsw needs to preroute the path that the encapsulated packet will take. That path may include a LAG device above a front panel port. So far, mlxsw resolved the path to the first up front panel slave of the LAG interface, but that only reflects administrative state of the port. It neglects to consider whether the port actually has a carrier, and what the LACP state is. So instead of checking upness of the device, check carrier state and txability. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Add lag.h, net_lag_port_dev_txable()Petr Machata2018-07-123-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LAG devices (team or bond) recognize for each one of their slave devices whether LAG traffic is going to be sent through that device. Bond calls such devices "active", team calls them "txable". When this state changes, a NETDEV_CHANGELOWERSTATE notification is distributed, together with a netdev_notifier_changelowerstate_info structure that for LAG devices includes a tx_enabled flag that refers to the new state. The notification thus makes it possible to react to the changes in txability in drivers. However there's no way to query txability from the outside on demand. That is problematic namely for mlxsw, which when resolving ERSPAN packet path, may encounter a LAG device, and needs to determine which of the slaves it should choose. To that end, introduce a new function, net_lag_port_dev_txable(), which determines whether a given slave device is "active" or "txable" (depending on the flavor of the LAG device). That function then dispatches to per-LAG-flavor helpers, bond_is_active_slave_dev() resp. team_port_dev_txable(). Because there currently is no good place where net_lag_port_dev_txable() should be added, introduce a new header file, lag.h, which should from now on hold any logic common to both team and bond. (But keep netif_is_lag_master() together with the rest of netif_is_*_master() functions). Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * team: Publish team_port_get_rcu()Petr Machata2018-07-122-5/+5
|/ | | | | | | | | | | | | | | | | | | A follow-up patch adds a new entry point, team_port_dev_txable(). Making it an ordinary exported function would mean that any module that may need the service in one of the supported configurations also unconditionally needs to pull in the team module, whether or not the user actually intends to create team interfaces. To prevent that, team_port_dev_txable() is defined in if_team.h, and therefore all dependencies of that function also need to be publicly-visible. Therefore move team_port_get_rcu() from team.c to if_team.h. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: Change status when lower device goes downTravis Brown2018-07-121-0/+1
| | | | | | | | | | | | | | | | Today macvlan ignores the notification when a lower device goes administratively down, preventing the lack of connectivity from bubbling up. Processing NETDEV_DOWN results in a macvlan state of LOWERLAYERDOWN with NO-CARRIER which should be easy to interpret in userspace. 2: lower: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000 3: macvlan@lower: <NO-CARRIER,BROADCAST,MULTICAST,UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000 Signed-off-by: Suresh Krishnan <skrishnan@arista.com> Signed-off-by: Travis Brown <travisb@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'tipc-make-link-protocol-more-resilient'David S. Miller2018-07-124-24/+80
|\ | | | | | | | | | | | | | | | | | | | | | | | | Jon Maloy says: ==================== tipc: make link protocol more resilient These two commits make the link ptotocol more resilient to infrastructures with frequent packet duplication and long delays. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * tipc: check session number before accepting link protocol messagesJon Maloy2018-07-123-22/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some virtual environments we observe a significant higher number of packet reordering and delays than we have been used to traditionally. This makes it necessary with stricter checks on incoming link protocol messages' session number, which until now only has been validated for RESET messages. Since the other two message types, ACTIVATE and STATE messages also carry this number, it is easy to extend the validation check to those messages. We also introduce a flag indicating if a link has a valid peer session number or not. This eliminates the mixing of 32- and 16-bit arithmethics we are currently using to achieve this. Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * tipc: add sequence number check for link STATE messagesJon Maloy2018-07-124-6/+32
|/ | | | | | | | | | | | | | | | | | Some switch infrastructures produce huge amounts of packet duplicates. This becomes a problem if those messages are STATE/NACK protocol messages, causing unnecessary retransmissions of already accepted packets. We now introduce a unique sequence number per STATE protocol message so that duplicates can be identified and ignored. This will also be useful when tracing such cases, and to avert replay attacks when TIPC is encrypted. For compatibility reasons we have to introduce a new capability flag TIPC_LINK_PROTO_SEQNO to handle this new feature. Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch '10GbE' of ↵David S. Miller2018-07-1235-131/+312
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== L2 Fwd Offload & 10GbE Intel Driver Updates 2018-07-09 This patch series is meant to allow support for the L2 forward offload, aka MACVLAN offload without the need for using ndo_select_queue. The existing solution currently requires that we use ndo_select_queue in the transmit path if we want to associate specific Tx queues with a given MACVLAN interface. In order to get away from this we need to repurpose the tc_to_txq array and XPS pointer for the MACVLAN interface and use those as a means of accessing the queues on the lower device. As a result we cannot offload a device that is configured as multiqueue, however it doesn't really make sense to configure a macvlan interfaced as being multiqueue anyway since it doesn't really have a qdisc of its own in the first place. The big changes in this set are: Allow lower device to update tc_to_txq and XPS map of offloaded MACVLAN Disable XPS for single queue devices Replace accel_priv with sb_dev in ndo_select_queue Add sb_dev parameter to fallback function for ndo_select_queue Consolidated ndo_select_queue functions that appeared to be duplicates ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: allow fallback function to pass netdevAlexander Duyck2018-07-0914-27/+24Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For most of these calls we can just pass NULL through to the fallback function as the sb_dev. The only cases where we cannot are the cases where we might be dealing with either an upper device or a driver that would have configured things to support an sb_dev itself. The only driver that has any significant change in this patch set should be ixgbe as we can drop the redundant functionality that existed in both the ndo_select_queue function and the fallback function that was passed through to us. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * net: allow ndo_select_queue to pass netdevAlexander Duyck2018-07-0929-42/+66
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes it so that instead of passing a void pointer as the accel_priv we instead pass a net_device pointer as sb_dev. Making this change allows us to pass the subordinate device through to the fallback function eventually so that we can keep the actual code in the ndo_select_queue call as focused on possible on the exception cases. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * net: Add generic ndo_select_queue functionsAlexander Duyck2018-07-096-26/+22Star
| | | | | | | | | | | | | | | | | | | | | | This patch adds a generic version of the ndo_select_queue functions for either returning 0 or selecting a queue based on the processor ID. This is generally meant to just reduce the number of functions we have to change in the future when we have to deal with ndo_select_queue changes. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * net: Add support for subordinate traffic classes to netdev_pick_txAlexander Duyck2018-07-094-46/+45Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change makes it so that we can support the concept of subordinate device traffic classes to the core networking code. In doing this we can start pulling out the driver specific bits needed to support selecting a queue based on an upper device. The solution at is currently stands is only partially implemented. I have the start of some XPS bits in here, but I would still need to allow for configuration of the XPS maps on the queues reserved for the subordinate devices. For now I am using the reference to the sb_dev XPS map as just a way to skip the lookup of the lower device XPS map for now as that would result in the wrong queue being picked. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * ixgbe: Add code to populate and use macvlan TC to Tx queue mapAlexander Duyck2018-07-091-6/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes it so that we use the tc_to_txq mapping in the macvlan device in order to select the Tx queue for outgoing packets. The idea here is to try and move away from using ixgbe_select_queue and to come up with a generic way to make this work for devices going forward. By encoding this information in the netdev this can become something that can be used generically as a solution for similar setups going forward. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * net: Add support for subordinate device traffic classesAlexander Duyck2018-07-093-2/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is meant to provide the basic tools needed to allow us to create subordinate device traffic classes. The general idea here is to allow subdividing the queues of a device into queue groups accessible through an upper device such as a macvlan. The idea here is to enforce the idea that an upper device has to be a single queue device, ideally with IFF_NO_QUQUE set. With that being the case we can pretty much guarantee that the tc_to_txq mappings and XPS maps for the upper device are unused. As such we could reuse those in order to support subdividing the lower device and distributing those queues between the subordinate devices. In order to distinguish between a regular set of traffic classes and if a device is carrying subordinate traffic classes I changed num_tc from a u8 to a s16 value and use the negative values to represent the subordinate pool values. So starting at -1 and running to -32768 we can encode those as pool values, and the existing values of 0 to 15 can be maintained. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * net-sysfs: Drop support for XPS and traffic_class on single queue deviceAlexander Duyck2018-07-091-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | This patch makes it so that we do not report the traffic class or allow XPS configuration on single queue devices. This is mostly to avoid unnecessary complexity with changes I have planned that will allow us to reuse the unused tc_to_txq and XPS configuration on a single queue device to allow it to make use of a subset of queues on an underlying device. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | tcp: expose both send and receive intervals for rate sampleDeepti Raghavan2018-07-122-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Congestion control algorithms, which access the rate sample through the tcp_cong_control function, only have access to the maximum of the send and receive interval, for cases where the acknowledgment rate may be inaccurate due to ACK compression or decimation. Algorithms may want to use send rates and receive rates as separate signals. Signed-off-by: Deepti Raghavan <deeptir@mit.edu> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: fix unprotected access to rcu cookie pointerVlad Buslov2018-07-121-2/+7
| | | | | | | | | | | | | | | | | | | | | | Fix action attribute size calculation function to take rcu read lock and access act_cookie pointer with rcu dereference. Fixes: eec94fdb0480 ("net: sched: use rcu for action cookie update") Reported-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'cxgb4-move-stats-fetched-from-firmware-to-debugfs'David S. Miller2018-07-122-133/+164
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rahul Lakkireddy says: ==================== cxgb4: move stats fetched from firmware to debugfs Some stats are fetched via slow firmware mailbox, which can cause packet drops under heavy load. So, this series removes these stats from ethtool -S and expose them via debugfs. Patch 1 removes stats fetched via firmware from ethtool -S. Patch 2 exposes stats removed in Patch 1 via debugfs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4: expose stats fetched from firmware via debugfsRahul Lakkireddy2018-07-121-0/+164
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose stats obtained from firmware via debugfs. These stats can't be part of ethtool -S because the slow firmware mailbox can cause packet drops under heavy load. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | cxgb4: remove stats fetched from firmwareRahul Lakkireddy2018-07-121-133/+0Star
|/ / | | | | | | | | | | | | | | | | | | | | | | When running ethtool -S, some stats are requested from firmware. Since getting these stats via firmware mailbox is slow, some packets get dropped under heavy load while running ethtool -S. So, remove these stats from ethtool -S. Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: mvpp2: explicitly include linux/interrupt.hAntoine Tenart2018-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | The Marvell PPv2 driver uses interrupts and tasklet but does not explicitly include linux/interrupt.h, relying on implicit includes. This one particularly is included by chance after a long unlogical chain of inclusions. Fix this so we do not get future build breaks. Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cnic: use kvzalloc to allocate memory for csk_tblJan Dakinevich2018-07-121-3/+3
| | | | | | | | | | | | | | | | Size of csk_tbl is about 58K, which means 3rd order page allocation. kvzalloc provides a fallback if no high order memory is available. Signed-off-by: Jan Dakinevich <jan.dakinevich@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | wimax/i2400m: remove redundant variables ack_status, bcf and protocolColin Ian King2018-07-123-6/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Variables ack_status, bcf and protocol are being assigned but are never used hence they are redundant and can be removed. Also declare ack_type as unsigned int rather than unsigned to clean up a checkpatch warning. Cleans up clang warnings: warning: variable 'ack_status' set but not used [-Wunused-but-set-variable] warning: variable 'bcf' set but not used [-Wunused-but-set-variable] warning: variable 'protocol' set but not used [-Wunused-but-set-variable] Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: sched: act_ife: fix memory leak in ife initVlad Buslov2018-07-121-1/+3
| | | | | | | | | | | | | | | | | | Free params if tcf_idr_check_alloc() returned error. Fixes: 0190c1d452a9 ("net: sched: atomically check-allocate action") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | cxgb4: specify IQTYPE in fw_iq_cmdArjun Vynipadath2018-07-122-1/+15
| | | | | | | | | | | | | | | | | | congestion argument passed to t4_sge_alloc_rxq() is used to differentiate between nic/ofld queues. Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'net-ipv6-addr_gen_mode-fixes'David S. Miller2018-07-122-16/+39
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sabrina Dubroca says: ==================== net/ipv6: addr_gen_mode fixes This series fixes bugs in handling of the addr_gen_mode option, mainly related to the sysctl. A minor netlink issue was also present in the initial commit introducing the option on a per-netdevice basis. v2: add patch 4, requested by David Ahern during review of v1 add patch 5, missing documentation for the sysctl patches 1, 2, 3 are unchanged ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Documentation: ip-sysctl.txt: document addr_gen_modeSabrina Dubroca2018-07-121-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | addr_gen_mode was introduced in without documentation, add it now. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ipv6: propagate net.ipv6.conf.all.addr_gen_mode to devicesSabrina Dubroca2018-07-121-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This aligns the addr_gen_mode sysctl with the expected behavior of the "all" variant. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ipv6: reserve room for IFLA_INET6_ADDR_GEN_MODESabrina Dubroca2018-07-121-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inet6_ifla6_size() is called to check how much space is needed by inet6_fill_link_af() and inet6_fill_ifinfo(), both of which include the IFLA_INET6_ADDR_GEN_MODE attribute. Reserve some room for it. Fixes: bc91b0f07ada ("ipv6: addrconf: implement address generation modes") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ipv6: don't reinitialize ndev->cnf.addr_gen_mode on new inet6_devSabrina Dubroca2018-07-121-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value has already been copied from this netns's devconf_dflt, it shouldn't be reset to the global kernel default. Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net/ipv6: fix addrconf_sysctl_addr_gen_modeSabrina Dubroca2018-07-121-13/+14
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | addrconf_sysctl_addr_gen_mode() has multiple problems. First, it ignores the errors returned by proc_dointvec(). addrconf_sysctl_addr_gen_mode() calls proc_dointvec() directly, which writes the value to memory, and then checks if it's valid and may return EINVAL. If a bad value is given, the value displayed when reading net.ipv6.conf.foo.addr_gen_mode next time will be invalid. In case the value provided by the user was valid, addrconf_dev_config() won't be called since idev->cnf.addr_gen_mode has already been updated. Fix this in the usual way we deal with values that need to be checked after the proc_do*() helper has returned: define a local ctl_table and storage, call proc_dointvec() on that temporary area, then check and store. addrconf_sysctl_addr_gen_mode() also writes the new value to the global ipv6_devconf_dflt, when we're writing to some netns's default, so that new netns will inherit the value that was set by the change occuring in any netns. That doesn't make any sense, so let's drop this assignment. Finally, since addr_gen_mode is a __u32, switch to proc_douintvec(). Fixes: d35a00b8e33d ("net/ipv6: allow sysctl to change link-local address generation mode") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/sched: flower: Fix null pointer dereference when run tc vlan commandJianbo Liu2018-07-121-22/+26
| | | | | | | | | | | | | | | | | | | | | | | | Zahari issued tc vlan command without setting vlan_ethtype, which will crash kernel. To avoid this, we must check tb[TCA_FLOWER_KEY_VLAN_ETH_TYPE] is not null before use it. Also we don't need to dump vlan_ethtype or cvlan_ethtype in this case. Fixes: d64efd0926ba ('net/sched: flower: Add supprt for matching on QinQ vlan headers') Signed-off-by: Jianbo Liu <jianbol@mellanox.com> Reported-by: Zahari Doychev <zahari.doychev@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | selftests: forwarding: mirror_lib: Tighten up VLAN capturePetr Machata2018-07-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function do_test_span_vlan_dir_ips() is used for testing whether mirrored packets are VLAN-encapsulated. But since it only considers VLAN encapsulation, it may end up matching unmirrored ARP traffic as well. One consequence is a rare failure of mirror_gre_vlan_bridge_1q's test_gretap_untagged_egress. Decreasing ping cadence in mirror_test() makes the problem easily reproducible. Therefore tighten up the match criterion to only count those 802.1q packets where the next header is IP. Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>