openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	net: dsa: b53: Fix build with B53_SRAB enabled and not B53_SERDES	Florian Fainelli	2018-09-08	2	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In case B53_SRAB is enabled, but not B53_SERDES, we can get the following linking error: ERROR: "b53_serdes_init" [drivers/net/dsa/b53/b53_srab.ko] undefined! We also need to ifdef the body of b53_srab_serdes_map_lane() since it would not be used when B53_SERDES is disabled and that would produce a warning. Fixes: 0e01491de646 ("net: dsa: b53: Add SerDes support") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
*	cxgb4: impose mandatory VLAN usage when non-zero TAG ID	Casey Leedom	2018-09-08	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	When a non-zero VLAN Tag ID is passed to t4_set_vlan_acl() then impose mandatory VLAN Usage with that VLAN ID. I.e any other VLAN ID should result in packets getting dropped. Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio: lio_fetch_vf_stats() can be static	kbuild test robot	2018-09-08	1	-1/+1
\| \| \| \| \| \|	Fixes: 488752220b4a ("liquidio: Add spoof checking on a VF MAC address") Signed-off-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	nfp: replace spin_lock_bh with spin_lock in tasklet callback	jun qian	2018-09-08	1	-2/+2
\| \| \| \| \| \| \| \|	As you are already in a tasklet, it is unnecessary to call spin_lock_bh. Signed-off-by: jun qian <hangdianqj@163.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: dsa: Expose tagging protocol to user-space	Florian Fainelli	2018-09-07	4	-0/+79
\| \| \| \| \| \| \| \| \| \| \| \| \|	There is no way for user-space to know what a given DSA network device's tagging protocol is. Expose this information through a dsa/tagging attribute which reflects the tagging protocol currently in use. This is helpful for configuration (e.g: none behaves dramatically different wrt. bridges) as well as for packet capture tools when there is not a proper Ethernet type available. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	freescale: ethernet: remove unnecessary unlikely()	Igor Stoppa	2018-09-07	1	-5/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Both WARN_ON() and WARN_ONCE() already contain an unlikely(), so it's not necessary to wrap it into another. Signed-off-by: Igor Stoppa <igor.stoppa@huawei.com> Cc: Madalin Bucur <madalin.bucur@nxp.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
*	bnxt_en: remove set but not used variable 'addr_type'	YueHaibing	2018-09-07	1	-15/+0
\| \| \| \| \| \| \| \| \| \| \|	Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c: In function 'bnxt_tc_parse_flow': drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c:186:6: warning: variable 'addr_type' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	openvswitch: Derive IP protocol number for IPv6 later frags	Yi-Hung Wei	2018-09-07	1	-13/+9
\| \| \| \| \| \| \| \| \| \| \|	Currently, OVS only parses the IP protocol number for the first IPv6 fragment, but sets the IP protocol number for the later fragments to be NEXTHDF_FRAGMENT. This patch tries to derive the IP protocol number for the IPV6 later frags so that we can match that. Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio CN23XX: Remove set but not used variable 'ring_flag'	YueHaibing	2018-09-07	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \|	Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c: In function 'cn23xx_setup_octeon_vf_device': drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.c:619:20: warning: variable 'ring_flag' set but not used [-Wunused-but-set-variable] Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	liquidio: Add spoof checking on a VF MAC address	Weilin Chang	2018-09-07	8	-12/+187
\| \| \| \| \| \| \| \| \| \| \| \|	1. Provide the API to set/unset the spoof checking feature. 2. Add a function to periodically provide the count of found packets with spoof VF MAC address. 3. Prevent VF MAC address changing while the spoofchk of the VF is on unless the changing MAC address is issued from PF. Signed-off-by: Weilin Chang <weilin.chang@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge tag 'mlx5e-updates-2018-09-05' of ↵	David S. Miller	2018-09-07	13	-153/+235
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5e-updates-2018-09-05 This series provides updates to mlx5 ethernet driver. 1) Starting with a four patches series to optimize flow counters updates, From Vlad Buslov: ============================================== By default mlx5 driver updates cached counters each second. Update function consumes noticeable amount of CPU resources. The goal of this patch series is to optimize update function. Investigation revealed following bottlenecks in fs counters implementation: 1) Update code(scheduled each second) iterates over all counters twice. (first for finding and deleting counters that are marked for deletion, second iteration is for actually updating the counters) 2) Counters are stored in rb tree. Linear iteration over all rb tree elements(rb_next in profiling data) consumed ~65% of time spent in update function. Following optimizations were implemented: 1) Instead of just marking counters for deletion, store them in standalone list. This removes first iteration over whole counters tree. 2) Store counters in sorted list to optimize traversing them and remove calls to rb_next. First implementation of these changes caused degradation of performance, instead of improving it. Investigation revealed that there first cache line of struct mlx5_fc is full and adding anything to it causes amount of cache misses to double. To mitigate that, following refactorings were implemented: - Change 'addlist' list type from double linked to single linked. This allowes to get free space for one additional pointer that is used to store deletion list(optimization 1) - Substitute rb tree with idr. Idr is non-intrusive data structure and doesn't require adding any new members to struct mlx5_fc. Use free space that became available for double linked sorted list that is used for traversing all counters. (optimization 2) Described changes reduced CPU time spent in mlx5_fc_stats_work from 70% to 44%. (global perf profile mode) ============================================ The rest of the series are misc updates: 2) From Kamal, Move mlx5e_priv_flags into en_ethtool.c, to avoid a compilation warning. 3) From Roi Dayan, Move Q counters allocation and drop RQ to init_rx profile function to avoid allocating Q counters when not required. 4) From Shay Agroskin, Replace PTP clock lock from RW lock to seq lock. Almost double the packet rate when timestamping is active on multiple TX queues. 5) From: Natali Shechtman, set ECN for received packets using CQE indication. 6) From: Alaa Hleihel, don't set CHECKSUM_COMPLETE on SCTP packets. CHECKSUM_COMPLETE is not applicable to SCTP protocol. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx5e: don't set CHECKSUM_COMPLETE on SCTP packets	Alaa Hleihel	2018-09-06	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	CHECKSUM_COMPLETE is not applicable to SCTP protocol. Setting it for SCTP packets leads to CRC32c validation failure. Fixes: bbceefce9adf ("net/mlx5e: Support RX CHECKSUM_COMPLETE") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Set ECN for received packets using CQE indication	Natali Shechtman	2018-09-06	3	-5/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In multi-host (MH) NIC scheme, a single HW port serves multiple hosts or sockets on the same host. The HW uses a mechanism in the PCIe buffer which monitors the amount of consumed PCIe buffers per host. On a certain configuration, under congestion, the HW emulates a switch doing ECN marking on packets using ECN indication on the completion descriptor (CQE). The driver needs to set the ECN bits on the packet SKB, such that the network stack can react on that, this commit does that. Signed-off-by: Natali Shechtman <natali@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Replace PTP clock lock from RW lock to seq lock	Shay Agroskin	2018-09-06	3	-21/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock' in order to improve packet rate performance. Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz. Sent 64b packets between two peers connected by ConnectX-5, and measured packet rate for the receiver in three modes: no time-stamping (base rate) time-stamping using rw_lock (old lock) for critical region time-stamping using seq_lock (new lock) for critical region Only the receiver time stamped its packets. The measured packet rate improvements are: Single flow (multiple TX rings to single RX ring): without timestamping: 4.26 (M packets)/sec with rw-lock (old lock): 4.1 (M packets)/sec with seq-lock (new lock): 4.16 (M packets)/sec 1.46% improvement Multiple flows (multiple TX rings to six RX rings): without timestamping: 22 (M packets)/sec with rw-lock (old lock): 11.7 (M packets)/sec with seq-lock (new lock): 21.3 (M packets)/sec 82.05% improvement The packet rate improvement is due to the lack of atomic operations for the 'readers' by the seq-lock. Since there are much more 'readers' than 'writers' contention on this lock, almost all atomic operations are saved. this results in a dramatic decrease in overall cache misses. Signed-off-by: Shay Agroskin <shayag@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Move Q counters allocation and drop RQ to init_rx	Roi Dayan	2018-09-06	4	-25/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Not all profiles query the HW Q counters in update_stats() callback. HW Q couners are limited per device and in case of representors all their Q counters are allocated on the parent PF device. Avoid reundant allocation of HW Q counters by moving the allocation to init_rx profile callback. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5e: Move mlx5e_priv_flags into en_ethtool.c	Kamal Heib	2018-09-06	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the definition of mlx5e_priv_flags into en_ethtool.c because it's only used there. Fixes: 4e59e2888139 ("net/mlx5e: Introduce net device priv flags infrastructure") Signed-off-by: Kamal Heib <kamalheib1@gmail.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5: Add flow counters idr	Vlad Buslov	2018-09-06	2	-4/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous patch in series changed flow counter storage structure from rb_tree to linked list in order to improve flow counter traversal performance. The drawback of such solution is that flow counter lookup by id becomes linear in complexity. Store pointers to flow counters in idr in order to improve lookup performance to logarithmic again. Idr is non-intrusive data structure and doesn't require extending flow counter struct with new elements. This means that idr can be used for lookup, while linked list from previous patch is used for traversal, and struct mlx5_fc size is <= 2 cache lines. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5: Store flow counters in a list	Vlad Buslov	2018-09-06	3	-50/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to improve performance of flow counter stats query loop that traverses all configured flow counters, replace rb_tree with double-linked list. This change improves performance of traversing flow counters by removing the tree traversal. (profiling data showed that call to rb_next was most top CPU consumer) However, lookup of flow flow counter in list becomes linear, instead of logarithmic. This problem is fixed by next patch in series, which adds idr for fast lookup. Idr is to be used because it is not an intrusive data structure and doesn't require adding any new members to struct mlx5_fc, which allows its control data part to stay <= 1 cache line in size. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5: Add new list to store deleted flow counters	Vlad Buslov	2018-09-06	3	-23/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters is added to struct mlx5_fc_stats. Lockless NULL-terminated single linked list data type is used due to following reasons: - This use case only needs to add single element to list and remove/iterate whole list. Lockless list doesn't require any additional synchronization for these operations. - First cache line of flow counter data structure only has space to store single additional pointer, which precludes usage of double linked list. Remove flow counter 'deleted' flag that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
\| *	net/mlx5: Change flow counters addlist type to single linked list	Vlad Buslov	2018-09-06	3	-29/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to prevent flow counters stats work function from traversing whole flow counters tree while searching for deleted flow counters, new list to store deleted flow counters will be added to struct mlx5_fc_stats. However, the flow counter structure itself has no space left to store any more data in first cache line. To free space that is needed to store additional list node, convert current addlist double linked list (two pointers per node) to atomic single linked list (one pointer per node). Lockless NULL-terminated single linked list data type doesn't require any additional external synchronization for operations used by flow counters module (add single new element, remove all elements from list and traverse them). Remove addlist_lock that is no longer needed. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Acked-by: Amir Vadai <amir@vadai.me> Reviewed-by: Paul Blakey <paulb@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
* \|	Merge branch 'dsa-b53-SerDes-support'	David S. Miller	2018-09-06	7	-29/+805
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Florian Fainelli says: ==================== net: dsa: b53: SerDes support This patch series adds support for the SerDes found on NorthStar Plus (NSP) which allows us to use the SFP port on the BCM958625HR board (and other similar designs). Changes in v3: - properly hunk the request_threaded_irq() bits into patch #2 Changes in v2: - migrate to threaded interrupt (Andrew) - fixed a case where MLO_AN_FIXED's mac_config would still call into the serdes_config callback - added an additional check on the phylink interface in mac_config - default to ARCH_BCM_NSP instead of ARCH_BCM_IPROC which is really the NSP Kconfig bit we want ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: b53: Add SerDes support	Florian Fainelli	2018-09-06	7	-0/+490
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for the Northstar Plus SerDes which is accessed through a special page of the switch. Since this is something that most people probably will not want to use, make it a configurable option with a default on ARCH_BCM_NSP where it is the most useful currently. The SerDes supports both SGMII and 1000baseX modes for both lanes, and 2500baseX for one of the lanes, and is internally looking like a seemingly standard MII PHY, except for the few bits that got repurposed. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: b53: Add PHYLINK support	Florian Fainelli	2018-09-06	2	-0/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for PHYLINK, things are reasonably straight forward since we do not yet support SerDes interfaces, that leaves us with just MLO_AN_PHY and MLO_AN_FIXED to deal with. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: b53: Add helper to set link parameters	Florian Fainelli	2018-09-06	1	-29/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Extract the logic from b53_adjust_link() responsible for overriding a given port's link, speed, duplex and pause settings and make two helper functions to set the port's configuration and the port's link settings. We will make use of both, as separate functions while adding PHYLINK support next. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: b53: Make SRAB driver manage port interrupts	Florian Fainelli	2018-09-06	1	-0/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the SRAB driver to manage per-port interrupts. Since we cannot sleep during b53_io_ops, schedule a workqueue whenever we get a port specific interrupt. We will later make use of this to call back into PHYLINK when there is e.g: a link state change. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: dsa: b53: Add ability to enable/disable port interrupts	Florian Fainelli	2018-09-06	2	-0/+11
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some switches expose individual interrupt line(s) for port specific event(s), allow configuring these interrupts at an appropriate time during port_enable/disable callbacks where all port specific resources are known to be set-up and ready for use. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	qed*: Utilize FW 8.37.7.0	Denis Bolotin	2018-09-06	6	-202/+367
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a new qed firmware with fixes and support for new features. Fixes: - Fix a rare case of device crash with iWARP, iSCSI or FCoE offload. - Fix GRE tunneled traffic when iWARP offload is enabled. - Fix RoCE failure in ib_send_bw when using inline data. - Fix latency optimization flow for inline WQEs. - BigBear 100G fix RDMA: - Reduce task context size. - Application page sizes above 2GB support. - Performance improvements. ETH: - Tenant DCB support. - Replace RSS indirection table update interface. Misc: - Debug Tools changes. Signed-off-by: Denis Bolotin <denis.bolotin@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'rtnetlink-add-IFA_TARGET_NETNSID-for-RTM_GETADDR'	David S. Miller	2018-09-06	7	-63/+180
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Christian Brauner says: ==================== rtnetlink: add IFA_TARGET_NETNSID for RTM_GETADDR This iteration should mainly addresses the suggestion to use IFA_TARGET_NETNSID as the property name. Additionally, an an alias for the already existing IFLA_IF_NETNSID property is added. Note that two additional cleanup patches (8\9 and 9\9) were added to address concerns raised that passing more than 6 arguments to a function will cause additional variables to be pushed onto the stack instead of being placed into registers. The way I addressed this is by introducing two new struct inet{6}_fill_args that are used to pass common information down to inet{6}_fill_if() functions shortening all those functions to three pointer arguments. If this is something more people than Kirill find useful they can be kept if not they can simply be dropped in later iterations of this series or when merging. Here is a short overview: 1. Rename from IFA_IF_NETNSID to IFA_TARGET_NETNSID. 2. Add IFLA_TARGET_NETNSID as an alias for IFA_IFLA_NETNSID and switch all occurrences over to the new alias. 3. Add inet4_fill_args struct to avoid passing more than 6 arguments in inet_fill_if() functions. 4. Add inet6_fill_args struct to avoid passing more than 6 arguments in inet_fill_if() functions. The only functional change is the export of rtnl_get_net_ns_capable() which is needed in case ipv6 is built as a module. Note, I did not change the property name to IFA_TARGET_NSID as there was no clear agreement what would be preferred. My personal preference is to keep the IFA_IF_NETNSID name because it aligns naturally with the IFLA_IF_NETNSID property for RTM_LINK requests. Jiri seems to prefer this name too. However, if there is agreement that another property name makes more sense I'm happy to send a v2 that changes this. To test this patchset I performed 1 million getifaddrs() requests against a network namespace containing 5 interfaces (lo, eth{0-4}). The first test used a network namespace aware getifaddrs() implementation I wrote and the second test used the traditional setns() + getifaddrs() method. The results show that this patchsets allows userspace to cut retrieval time in half: 1. netns_getifaddrs(): 82 microseconds 2. setns() + getifaddrs(): 162 microseconds A while back we introduced and enabled IFLA_IF_NETNSID in RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led to signficant performance increases since it allows userspace to avoid taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the interfaces from the netns associated with the netns_fd. Especially when a lot of network namespaces are in use, using setns() becomes increasingly problematic when performance matters. Usually, RTML_GETLINK requests are followed by RTM_GETADDR requests (cf. getifaddrs() style functions and friends). But currently, RTM_GETADDR requests do not support a similar property like IFLA_IF_NETNSID for RTM_LINK requests. This is problematic since userspace can retrieve interfaces from another network namespace by sending a IFLA_IF_NETNSID property along but RTM_GETLINK request but is still forced to use the legacy setns() style of retrieving interfaces in RTM_GETADDR requests. The goal of this series is to make it possible to perform RTM_GETADDR requests on different network namespaces. To this end a new IFA_IF_NETNSID property for RTM_ADDR requests is introduced. It can be used to send a network namespace identifier along in RTM_ADDR requests. The network namespace identifier will be used to retrieve the target network namespace in which the request is supposed to be fulfilled. This aligns the behavior of RTM_ADDR requests with the behavior of RTM_*LINK requests. - The caller must have assigned a valid network namespace identifier for the target network namespace. - The caller must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. [1]: commit 7973bfd8758d ("rtnetlink: remove check for IFLA_IF_NETNSID") [2]: commit 5bb8ed075428 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_NEWLINK") [3]: commit b61ad68a9fe8 ("rtnetlink: enable IFLA_IF_NETNSID for RTM_DELLINK") [4]: commit c310bfcb6e1b ("rtnetlink: enable IFLA_IF_NETNSID for RTM_SETLINK") [5]: commit 7c4f63ba8243 ("rtnetlink: enable IFLA_IF_NETNSID in do_setlink()") ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv6: add inet6_fill_args	Christian Brauner	2018-09-06	1	-34/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inet6_fill_if{addr,mcaddr, acaddr}() already took 6 arguments which meant the 7th argument would need to be pushed onto the stack on x86. Add a new struct inet6_fill_args which holds common information passed to inet6_fill_if{addr,mcaddr, acaddr}() and shortens the functions to three pointer arguments. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: add inet_fill_args	Christian Brauner	2018-09-06	1	-15/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inet_fill_ifaddr() already took 6 arguments which meant the 7th argument would need to be pushed onto the stack on x86. Add a new struct inet_fill_args which holds common information passed to inet_fill_ifaddr() and shortens the function to three pointer arguments. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	rtnetlink: s/IFLA_IF_NETNSID/IFLA_TARGET_NETNSID/g	Christian Brauner	2018-09-06	1	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	IFLA_TARGET_NETNSID is the new alias for IFLA_IF_NETNSID. This commit replaces all occurrences of IFLA_IF_NETNSID with the new alias to indicate that this identifier is the preferred one. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	if_link: add IFLA_TARGET_NETNSID alias	Christian Brauner	2018-09-06	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds IFLA_TARGET_NETNSID as an alias for IFLA_IF_NETNSID for RTM_LINK requests. The new name is clearer and also aligns with the newly introduced IFA_TARGET_NETNSID propert for RTM_ADDR requests. Signed-off-by: Christian Brauner <christian@brauner.io> Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Cc: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	rtnetlink: move type calculation out of loop	Christian Brauner	2018-09-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I don't see how the type - which is one of RTM_{GETADDR,GETROUTE,GETNETCONF} - can change. So do the message type calculation once before entering the for loop. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv6: enable IFA_TARGET_NETNSID for RTM_GETADDR	Christian Brauner	2018-09-06	1	-15/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Backwards Compatibility: If userspace wants to determine whether ipv6 RTM_GETADDR requests support the new IFA_TARGET_NETNSID property it should verify that the reply includes the IFA_TARGET_NETNSID property. If it does not userspace should assume that IFA_TARGET_NETNSID is not supported for ipv6 RTM_GETADDR requests on this kernel. - From what I gather from current userspace tools that make use of RTM_GETADDR requests some of them pass down struct ifinfomsg when they should actually pass down struct ifaddrmsg. To not break existing tools that pass down the wrong struct we will do the same as for RTM_GETLINK \| NLM_F_DUMP requests and not error out when the nlmsg_parse() fails. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ipv4: enable IFA_TARGET_NETNSID for RTM_GETADDR	Christian Brauner	2018-09-06	1	-8/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Backwards Compatibility: If userspace wants to determine whether ipv4 RTM_GETADDR requests support the new IFA_TARGET_NETNSID property it should verify that the reply includes the IFA_TARGET_NETNSID property. If it does not userspace should assume that IFA_TARGET_NETNSID is not supported for ipv4 RTM_GETADDR requests on this kernel. - From what I gather from current userspace tools that make use of RTM_GETADDR requests some of them pass down struct ifinfomsg when they should actually pass down struct ifaddrmsg. To not break existing tools that pass down the wrong struct we will do the same as for RTM_GETLINK \| NLM_F_DUMP requests and not error out when the nlmsg_parse() fails. - Security: Callers must have CAP_NET_ADMIN in the owning user namespace of the target network namespace. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	if_addr: add IFA_TARGET_NETNSID	Christian Brauner	2018-09-06	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds a new IFA_TARGET_NETNSID property to be used by address families such as PF_INET and PF_INET6. The IFA_TARGET_NETNSID property can be used to send a network namespace identifier as part of a request. If a IFA_TARGET_NETNSID property is identified it will be used to retrieve the target network namespace in which the request is to be made. Signed-off-by: Christian Brauner <christian@brauner.io> Cc: Jiri Benc <jbenc@redhat.com> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	rtnetlink: add rtnl_get_net_ns_capable()	Christian Brauner	2018-09-06	2	-4/+14
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	get_target_net() will be used in follow-up patches in ipv{4,6} codepaths to retrieve network namespaces based on network namespace identifiers. So remove the static declaration and export in the rtnetlink header. Also, rename it to rtnl_get_net_ns_capable() to make it obvious what this function is doing. Export rtnl_get_net_ns_capable() so it can be used when ipv6 is built as a module. Signed-off-by: Christian Brauner <christian@brauner.io> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'net-lan78xx-Minor-improvements'	David S. Miller	2018-09-06	2	-34/+14
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Stefan Wahren says: ==================== net: lan78xx: Minor improvements This patch series contains some minor improvements for the lan78xx driver. Changes in V2: - Keep Copyright comment as multi-line - Add Raghuram's Reviewed-by ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: lan78xx: Make declaration style consistent	Stefan Wahren	2018-09-06	1	-7/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes some declaration more consistent. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: lan78xx: Switch to SPDX identifier	Stefan Wahren	2018-09-06	2	-26/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adopt the SPDX license identifier headers to ease license compliance management. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: lan78xx: Drop unnecessary strcpy in lan78xx_probe	Stefan Wahren	2018-09-06	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need for this strcpy because alloc_etherdev() already does this job. Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net: lan78xx: Bail out if lan78xx_get_endpoints fails	Stefan Wahren	2018-09-06	1	-0/+5
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We need to bail out if lan78xx_get_endpoints() fails, otherwise the result is overwritten. Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Reviewed-by: Raghuram Chary Jallipalli <raghuramchary.jallipalli@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	nfp: separate VXLAN and GRE feature handling	Jakub Kicinski	2018-09-06	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VXLAN and GRE FW features have to currently be both advertised for the driver to enable them. Separate the handling. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'nfp-improve-the-new-rtsym-helpers'	David S. Miller	2018-09-06	1	-15/+60
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Kicinski says: ==================== nfp: improve the new rtsym helpers This set fixes a bug in ABS rtsym handling I added in net-next, it expands the error checking and reporting on the rtsym accesses. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	nfp: validate rtsym accesses fall within the symbol	Jakub Kicinski	2018-09-06	1	-3/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the accesses to rtsyms now all going via special helpers we can easily make sure the driver is not reading past the end of the symbol. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	nfp: prefix rtsym error messages with symbol name	Jakub Kicinski	2018-09-06	1	-10/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For ease of debug preface all error messages with the name of the symbol which caused them. Use the same message format for existing messages while at it. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	nfp: fix readq on absolute RTsyms	Jakub Kicinski	2018-09-06	1	-2/+4
\|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Return the error and report value through the output param. Fixes: 640917dd81b6 ("nfp: support access to absolute RTsyms") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Francois H. Theron <francois.theron@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	failover: Add missing check to validate 'slave_dev' in ↵	YueHaibing	2018-09-06	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	net_failover_slave_unregister Fixes gcc '-Wunused-but-set-variable' warning: drivers/net/net_failover.c: In function 'net_failover_slave_unregister': drivers/net/net_failover.c:598:35: warning: variable 'primary_dev' set but not used [-Wunused-but-set-variable] There should check the validity of 'slave_dev'. Fixes: cfc80d9a1163 ("net: Introduce net_failover driver") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	netlink: Make groups check less stupid in netlink_bind()	Dmitry Safonov	2018-09-06	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As Linus noted, the test for 0 is needless, groups type can follow the usual kernel style and 8sizeof(unsigned long) is BITS_PER_LONG: > The code [..] isn't technically incorrect... > But it is stupid. > Why stupid? Because the test for 0 is pointless. > > Just doing > if (nlk->ngroups < 8sizeof(groups)) > groups &= (1UL << nlk->ngroups) - 1; > > would have been fine and more understandable, since the "mask by shift > count" already does the right thing for a ngroups value of 0. Now that > test for zero makes me go "what's special about zero?". It turns out > that the answer to that is "nothing". [..] > The type of "groups" is kind of silly too. > > Yeah, "long unsigned int" isn't _technically_ wrong. But we normally > call that type "unsigned long". Cleanup my piece of pointlessness. Cc: "David S. Miller" <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: netdev@vger.kernel.org Fairly-blamed-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	packet: add sockopt to ignore outgoing packets	Vincent Whitchurch	2018-09-06	4	-0/+22
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the only way to ignore outgoing packets on a packet socket is via the BPF filter. With MSG_ZEROCOPY, packets that are looped into AF_PACKET are copied in dev_queue_xmit_nit(), and this copy happens even if the filter run from packet_rcv() would reject them. So the presence of a packet socket on the interface takes away the benefits of MSG_ZEROCOPY, even if the packet socket is not interested in outgoing packets. (Even when MSG_ZEROCOPY is not used, the skb is unnecessarily cloned, but the cost for that is much lower.) Add a socket option to allow AF_PACKET sockets to ignore outgoing packets to solve this. Note that the *BSDs already have something similar: BIOCSSEESENT/BIOCSDIRECTION and BIOCSDIRFILT. The first intended user is lldpd. Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>