openslx/kernel-qcow2-linux.git - In-kernel qcow2 (Kernel part)

	Commit message (Collapse)	Author	Age	Files	Lines
*	mlx4: xdp_prog becomes inactive after ethtool '-L' or '-G'	Martin KaFai Lau	2017-02-03	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After calling mlx4_en_try_alloc_resources (e.g. by changing the number of rx-queues with ethtool -L), the existing xdp_prog becomes inactive. The bug is that the xdp_prog ptr has not been carried over from the old rx-queues to the new rx-queues Fixes: 47a38e155037 ("net/mlx4_en: add support for fast rx drop bpf program") Cc: Brenden Blanco <bblanco@plumgrid.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: Fix memory leak after mlx4_en_update_priv()	Martin KaFai Lau	2017-02-03	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In mlx4_en_update_priv(), dst->tx_ring[t] and dst->tx_cq[t] are over-written by src->tx_ring[t] and src->tx_cq[t] without first calling kfree. One of the reproducible code paths is by doing 'ethtool -L'. The fix is to do the kfree in mlx4_en_free_resources(). Here is the kmemleak report: unreferenced object 0xffff880841211800 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0a8>] mlx4_en_try_alloc_resources+0x118/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff unreferenced object 0xffff880841213000 (size 2048): comm "ethtool", pid 3096, jiffies 4294716940 (age 528.353s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81930718>] kmemleak_alloc+0x28/0x50 [<ffffffff8120b213>] kmem_cache_alloc_trace+0x103/0x260 [<ffffffff8170e0cb>] mlx4_en_try_alloc_resources+0x13b/0x1a0 [<ffffffff817065a9>] mlx4_en_set_ringparam+0x169/0x210 [<ffffffff818040c5>] dev_ethtool+0xae5/0x2190 [<ffffffff8181b898>] dev_ioctl+0x168/0x6f0 [<ffffffff817d7a72>] sock_do_ioctl+0x42/0x50 [<ffffffff817d819b>] sock_ioctl+0x21b/0x2d0 [<ffffffff81247a73>] do_vfs_ioctl+0x93/0x6a0 [<ffffffff812480f9>] SyS_ioctl+0x79/0x90 [<ffffffff8193d7ea>] entry_SYSCALL_64_fastpath+0x18/0xad [<ffffffffffffffff>] 0xffffffffffffffff (gdb) list mlx4_en_try_alloc_resources+0x118 0xffffffff8170e0a8 is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2145). 2140 if (!dst->tx_ring_num[t]) 2141 continue; 2142 2143 dst->tx_ring[t] = kzalloc(sizeof(struct mlx4_en_tx_ring ) * 2144 MAX_TX_RINGS, GFP_KERNEL); 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq ) 2149 MAX_TX_RINGS, GFP_KERNEL); (gdb) list mlx4_en_try_alloc_resources+0x13b 0xffffffff8170e0cb is in mlx4_en_try_alloc_resources (drivers/net/ethernet/mellanox/mlx4/en_netdev.c:2150). 2145 if (!dst->tx_ring[t]) 2146 goto err_free_tx; 2147 2148 dst->tx_cq[t] = kzalloc(sizeof(struct mlx4_en_cq ) * 2149 MAX_TX_RINGS, GFP_KERNEL); 2150 if (!dst->tx_cq[t]) { 2151 kfree(dst->tx_ring[t]); 2152 goto err_free_tx; 2153 } 2154 } Fixes: ec25bc04ed8e ("net/mlx4_en: Add resilience in low memory systems") Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: do not call napi_schedule() without care	Eric Dumazet	2017-01-16	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Disable BH around the call to napi_schedule() to avoid following warning [ 52.095499] NOHZ: local_softirq_pending 08 [ 52.421291] NOHZ: local_softirq_pending 08 [ 52.608313] NOHZ: local_softirq_pending 08 Fixes: 8d59de8f7bb3 ("net/mlx4_en: Process all completions in RX rings after port goes up") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Cc: Tariq Toukan <tariqt@mellanox.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: Return EOPNOTSUPP instead of ENOTSUPP	Martin KaFai Lau	2017-01-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	In commit b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs"), it changed EOPNOTSUPP to ENOTSUPP by mistake. This patch fixes it. Fixes: b45f0674b997 ("mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs") Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Fix user prio field in XDP forward	Tariq Toukan	2016-12-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \|	The user prio field is wrong (and overflows) in the XDP forward flow. This is a result of a bad value for num_tx_rings_p_up, which should account all XDP TX rings, as they operate for the same user prio. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: xdp: Reserve headroom for receiving packet when XDP prog is active	Martin KaFai Lau	2016-12-08	1	-6/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Reserve XDP_PACKET_HEADROOM for packet and enable bpf_xdp_adjust_head() support. This patch only affects the code path when XDP is active. After testing, the tx_dropped counter is incremented if the xdp_prog sends more than wire MTU. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	mlx4: xdp: Allow raising MTU up to one page minus eth and vlan hdrs	Martin KaFai Lau	2016-12-08	1	-8/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When XDP is active in mlx4, mlx4 is using one page/pkt. At the same time (i.e. when XDP is active), it is currently limiting MTU to be FRAG_SZ0 - ETH_HLEN - (2 * VLAN_HLEN) which is 1514 in x86. AFAICT, we can at least raise the MTU limit up to PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) which this patch is doing. It will be useful in the next patch which allows XDP program to extend the packet by adding new header(s). Note: In the earlier XDP patches, there is already existing guard to ensure the page/pkt scheme only applies when XDP is active in mlx4. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	bpf: xdp: Allow head adjustment in XDP prog	Martin KaFai Lau	2016-12-08	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch allows XDP prog to extend/remove the packet data at the head (like adding or removing header). It is done by adding a new XDP helper bpf_xdp_adjust_head(). It also renames bpf_helper_changes_skb_data() to bpf_helper_changes_pkt_data() to better reflect that XDP prog does not work on skb. This patch adds one "xdp_adjust_head" bit to bpf_prog for the XDP-capable driver to check if the XDP prog requires bpf_xdp_adjust_head() support. The driver can then decide to error out during XDP_SETUP_PROG. Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-12-03	1	-15/+2
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Revert "net/mlx4_en: Avoid unregister_netdev at shutdown flow"	Tariq Toukan	2016-11-28	1	-15/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 9d76931180557270796f9631e2c79b9c7bb3c9fb. Using unregister_netdev at shutdown flow prevents calling the netdev's ndos or trying to access its freed resources. This fixes crashes like the following: Call Trace: [<ffffffff81587a6e>] dev_get_phys_port_id+0x1e/0x30 [<ffffffff815a36ce>] rtnl_fill_ifinfo+0x4be/0xff0 [<ffffffff815a53f3>] rtmsg_ifinfo_build_skb+0x73/0xe0 [<ffffffff815a5476>] rtmsg_ifinfo.part.27+0x16/0x50 [<ffffffff815a54c8>] rtmsg_ifinfo+0x18/0x20 [<ffffffff8158a6c6>] netdev_state_change+0x46/0x50 [<ffffffff815a5e78>] linkwatch_do_dev+0x38/0x50 [<ffffffff815a6165>] __linkwatch_run_queue+0xf5/0x170 [<ffffffff815a6205>] linkwatch_event+0x25/0x30 [<ffffffff81099a82>] process_one_work+0x152/0x400 [<ffffffff8109a325>] worker_thread+0x125/0x4b0 [<ffffffff8109a200>] ? rescuer_thread+0x350/0x350 [<ffffffff8109fc6a>] kthread+0xca/0xe0 [<ffffffff8109fba0>] ? kthread_park+0x60/0x60 [<ffffffff816a1285>] ret_from_fork+0x25/0x30 Fixes: 9d7693118055 ("net/mlx4_en: Avoid unregister_netdev at shutdown flow") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reported-by: Steve Wise <swise@opengridcomputing.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4: fix use-after-free in mlx4_en_fold_software_stats()	Eric Dumazet	2016-12-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	My recent commit to get more precise rx/tx counters in ndo_get_stats64() can lead to crashes at device dismantle, as Jesper found out. We must prevent mlx4_en_fold_software_stats() trying to access tx/rx rings if they are deleted. Fix this by adding a test against priv->port_up in mlx4_en_fold_software_stats() Calling mlx4_en_fold_software_stats() from mlx4_en_stop_port() allows us to eventually broadcast the latest/current counters to rtnetlink monitors. Fixes: 40931b85113d ("mlx4: give precise rx/tx bytes/packets counters") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-and-bisected-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@dev.mellanox.co.il> Acked-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4: give precise rx/tx bytes/packets counters	Eric Dumazet	2016-11-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mlx4 stats are chaotic because a deferred work queue is responsible to update them every 250 ms. Even sampling stats every one second with "sar -n DEV 1" gives variations like the following : lpaa23:~# sar -n DEV 1 10 \| grep eth0 \| cut -c1-65 07:39:22 eth0 146877.00 3265554.00 9467.15 4828168.50 07:39:23 eth0 146587.00 3260329.00 9448.15 4820445.98 07:39:24 eth0 146894.00 3259989.00 9468.55 4819943.26 07:39:25 eth0 110368.00 2454497.00 7113.95 3629012.17 <<>> 07:39:26 eth0 146563.00 3257502.00 9447.25 4816266.23 07:39:27 eth0 145678.00 3258292.00 9389.79 4817414.39 07:39:28 eth0 145268.00 3253171.00 9363.85 4809852.46 07:39:29 eth0 146439.00 3262185.00 9438.97 4823172.48 07:39:30 eth0 146758.00 3264175.00 9459.94 4826124.13 07:39:31 eth0 146843.00 3256903.00 9465.44 4815381.97 Average: eth0 142827.50 3179259.70 9206.30 4700578.16 This patch allows rx/tx bytes/packets counters being folded at the time we need stats. We now can fetch stats every 1 ms if we want to check NIC behavior on a small time window. It is also easier to detect anomalies. lpaa23:~# sar -n DEV 1 10 \| grep eth0 \| cut -c1-65 07:42:50 eth0 142915.00 3177696.00 9212.06 4698270.42 07:42:51 eth0 143741.00 3200232.00 9265.15 4731593.02 07:42:52 eth0 142781.00 3171600.00 9202.92 4689260.16 07:42:53 eth0 143835.00 3192932.00 9271.80 4720761.39 07:42:54 eth0 141922.00 3165174.00 9147.64 4679759.21 07:42:55 eth0 142993.00 3207038.00 9216.78 4741653.05 07:42:56 eth0 141394.06 3154335.64 9113.85 4663731.73 07:42:57 eth0 141850.00 3161202.00 9144.48 4673866.07 07:42:58 eth0 143439.00 3180736.00 9246.05 4702755.35 07:42:59 eth0 143501.00 3210992.00 9249.99 4747501.84 Average: eth0 142835.66 3182165.93 9206.98 4704874.08 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4: do not use priv->stats_lock in mlx4_en_auto_moderation()	Eric Dumazet	2016-11-27	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Per RX ring packets/bytes counters are not protected by global priv->stats_lock. Better not confuse the reader, and use READ_ONCE() to show we read these counters without surrounding synchronization. Interrupt moderation is best effort, and we do not really care of ultra precise counters. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-11-27	1	-1/+4
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Free netdev resources under state lock	Tariq Toukan	2016-11-24	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make sure mlx4_en_free_resources is called under the netdev state lock. This is needed since RCU dereference of XDP prog should be protected. Fixes: 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sagi Grimberg <sagi@grimberg.me> CC: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4: reorganize struct mlx4_en_tx_ring	Eric Dumazet	2016-11-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Goal is to reorganize this critical structure to increase performance. ndo_start_xmit() should only dirty one cache line, and access as few cache lines as possible. Add sp_ (Slow Path) prefix to fields that are not used in fast path, to make clear what is going on. After this patch pahole reports something much better, as all ndo_start_xmit() needed fields are packed into two cache lines instead of seven or eight struct mlx4_en_tx_ring { u32 last_nr_txbb; /* 0 0x4 / u32 cons; / 0x4 0x4 / long unsigned int wake_queue; / 0x8 0x8 / struct netdev_queue tx_queue; /* 0x10 0x8 / u32 (free_tx_desc)(struct mlx4_en_priv , struct mlx4_en_tx_ring , int, u8, u64, int); /* 0x18 0x8 / struct mlx4_en_rx_ring recycle_ring; /* 0x20 0x8 / / XXX 24 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / u32 prod; / 0x40 0x4 / unsigned int tx_dropped; / 0x44 0x4 / long unsigned int bytes; / 0x48 0x8 / long unsigned int packets; / 0x50 0x8 / long unsigned int tx_csum; / 0x58 0x8 / long unsigned int tso_packets; / 0x60 0x8 / long unsigned int xmit_more; / 0x68 0x8 / struct mlx4_bf bf; / 0x70 0x18 / / --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- / __be32 doorbell_qpn; / 0x88 0x4 / __be32 mr_key; / 0x8c 0x4 / u32 size; / 0x90 0x4 / u32 size_mask; / 0x94 0x4 / u32 full_size; / 0x98 0x4 / u32 buf_size; / 0x9c 0x4 / void buf; /* 0xa0 0x8 / struct mlx4_en_tx_info tx_info; /* 0xa8 0x8 / int qpn; / 0xb0 0x4 / u8 queue_index; / 0xb4 0x1 / bool bf_enabled; / 0xb5 0x1 / bool bf_alloced; / 0xb6 0x1 / u8 hwtstamp_tx_type; / 0xb7 0x1 / u8 bounce_buf; /* 0xb8 0x8 / / --- cacheline 3 boundary (192 bytes) --- / long unsigned int queue_stopped; / 0xc0 0x8 / struct mlx4_hwq_resources sp_wqres; / 0xc8 0x58 / / --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- / struct mlx4_qp sp_qp; / 0x120 0x30 / / --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- / struct mlx4_qp_context sp_context; / 0x150 0xf8 / / --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- / cpumask_t sp_affinity_mask; / 0x248 0x20 / enum mlx4_qp_state sp_qp_state; / 0x268 0x4 / u16 sp_stride; / 0x26c 0x2 / u16 sp_cqn; / 0x26e 0x2 / / size: 640, cachelines: 10, members: 36 / / sum members: 600, holes: 1, sum holes: 24 / / padding: 16 / }; Instead of this silly placement : struct mlx4_en_tx_ring { u32 last_nr_txbb; / 0 0x4 / u32 cons; / 0x4 0x4 / long unsigned int wake_queue; / 0x8 0x8 / / XXX 48 bytes hole, try to pack / / --- cacheline 1 boundary (64 bytes) --- / u32 prod; / 0x40 0x4 / / XXX 4 bytes hole, try to pack / long unsigned int bytes; / 0x48 0x8 / long unsigned int packets; / 0x50 0x8 / long unsigned int tx_csum; / 0x58 0x8 / long unsigned int tso_packets; / 0x60 0x8 / long unsigned int xmit_more; / 0x68 0x8 / unsigned int tx_dropped; / 0x70 0x4 / / XXX 4 bytes hole, try to pack / struct mlx4_bf bf; / 0x78 0x18 / / --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- / long unsigned int queue_stopped; / 0x90 0x8 / cpumask_t affinity_mask; / 0x98 0x10 / struct mlx4_qp qp; / 0xa8 0x30 / / --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- / struct mlx4_hwq_resources wqres; / 0xd8 0x58 / / --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- / u32 size; / 0x130 0x4 / u32 size_mask; / 0x134 0x4 / u16 stride; / 0x138 0x2 / / XXX 2 bytes hole, try to pack / u32 full_size; / 0x13c 0x4 / / --- cacheline 5 boundary (320 bytes) --- / u16 cqn; / 0x140 0x2 / / XXX 2 bytes hole, try to pack / u32 buf_size; / 0x144 0x4 / __be32 doorbell_qpn; / 0x148 0x4 / __be32 mr_key; / 0x14c 0x4 / void buf; /* 0x150 0x8 / struct mlx4_en_tx_info tx_info; /* 0x158 0x8 / struct mlx4_en_rx_ring recycle_ring; /* 0x160 0x8 / u32 (free_tx_desc)(struct mlx4_en_priv , struct mlx4_en_tx_ring , int, u8, u64, int); /* 0x168 0x8 / u8 bounce_buf; /* 0x170 0x8 / struct mlx4_qp_context context; / 0x178 0xf8 / / --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- / int qpn; / 0x270 0x4 / enum mlx4_qp_state qp_state; / 0x274 0x4 / u8 queue_index; / 0x278 0x1 / bool bf_enabled; / 0x279 0x1 / bool bf_alloced; / 0x27a 0x1 / / XXX 5 bytes hole, try to pack / / --- cacheline 10 boundary (640 bytes) --- / struct netdev_queue tx_queue; /* 0x280 0x8 / int hwtstamp_tx_type; / 0x288 0x4 / / size: 704, cachelines: 11, members: 36 / / sum members: 587, holes: 6, sum holes: 65 / / padding: 52 */ }; Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-11-15	1	-1/+0
\|\\| \| \| \| \| \| \| \| \| \| \| \| \|	Several cases of bug fixes in 'net' overlapping other changes in 'net-next-. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	Revert "net/mlx4_en: Fix panic during reboot"	Tariq Toukan	2016-11-09	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 9d2afba058722d40cc02f430229c91611c0e8d16. The original issue would possibly exist if an external module tried calling our "ethtool_ops" without checking if it still exists. The right way of solving it is by simply doing the check in the caller side. Currently, no action is required as there's no such use case. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	bpf, mlx4: fix prog refcount in mlx4_en_try_alloc_resources error path	Daniel Borkmann	2016-11-13	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") added a bug in that the prog's reference count is not dropped in the error path when mlx4_en_try_alloc_resources() is failing from mlx4_xdp_set(). We previously took bpf_prog_add(prog, priv->rx_ring_num - 1), that we need to release again. Earlier in the call path, dev_change_xdp_fd() itself holds a reference to the prog as well (hence the '- 1' in the bpf_prog_add()), so a simple atomic_sub() is safe to use here. When an error is propagated, then bpf_prog_put() is called eventually from dev_change_xdp_fd() Fixes: 67f8b1dcb9ee ("net/mlx4_en: Refactor the XDP forwarding rings scheme") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Add ethtool statistics for XDP cases	Tariq Toukan	2016-11-02	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XDP statistics are reported in ethtool, in total and per ring, as follows: - xdp_drop: the number of packets dropped by xdp. - xdp_tx: the number of packets forwarded by xdp. - xdp_tx_full: the number of times an xdp forward failed due to a full tx xdp ring. In addition, all packets that are dropped/forwarded by XDP are no longer accounted in rx_packets/rx_bytes of the ring, so that they count traffic that is passed to the stack. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Refactor the XDP forwarding rings scheme	Tariq Toukan	2016-11-02	1	-148/+230
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Separately manage the two types of TX rings: regular ones, and XDP. Upon an XDP set, do not borrow regular TX rings and convert them into XDP ones, but allocate new ones, unless we hit the max number of rings. Which means that in systems with smaller #cores we will not consume the current TX rings for XDP, while we are still in the num TX limit. XDP TX rings counters are not shown in ethtool statistics. Instead, XDP counters will be added to the respective RX rings in a downstream patch. This has no performance implications. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-10-30	1	-2/+11
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Mostly simple overlapping changes. For example, David Ahern's adjacency list revamp in 'net-next' conflicted with an adjacency list traversal bug fix in 'net'. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Save slave ethtool stats command	Tariq Toukan	2016-10-29	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Following the previous patch, as an optimization, the slave will not even bother sending the DUMP_ETH_STATS command over the comm channel. Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Fix panic during reboot	Eugenia Emantayev	2016-10-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix a kernel panic that occurs as a result of an asynchronous event handled in roce_gid_mgmt: mlx4_en_get_drvinfo is called and accesses freed resources. This happens in a shutdown flow only, since pci device is destroyed while netdevice is still alive. Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC") Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Process all completions in RX rings after port goes up	Erez Shitrit	2016-10-29	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently there is a race between incoming traffic and initialization flow. HW is able to receive the packets after INIT_PORT is done and unicast steering is configured. Before we set priv->port_up NAPI is not scheduled and receive queues become full. Therefore we never get new interrupts about the completions. This issue could happen if running heavy traffic during bringing port up. The resolution is to schedule NAPI once port_up is set. If receive queues were full this will process all cqes and release them. Fixes: c27a02cd94d6 ("mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC") Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ethernet/mellanox: use core min/max MTU checking	Jarod Wilson	2016-10-18	1	-4/+4
\|/ \| \| \| \| \| \| \| \| \| \| \|	mlx4: min_mtu 46, max_mtu depends on hardware mlx5: min_mtu 68, max_mtu depends on hardware CC: netdev@vger.kernel.org CC: Tariq Toukan <tariqt@mellanox.com> CC: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: Add VF vlan protocol 802.1ad support	Moshe Shemesh	2016-09-24	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move the vf to VST 802.1ad mode (mlx4 VST QinQ mode) by setting vf vlan protocol to 802.1ad. VST 802.1ad mode in mlx4, is used for STAG strip/insertion by PF, while the CTAG is set by the VF. Read current vlan protocol as part of the vf configuration state. Upon setting vf vlan protocol to 802.1ad, we use a mechanism of handshake to verify that both the vf and the pf driver version support it. The handshake uses the command QUERY_FUNC_CAP: - The vf sets a pre-defined support bit in input modifier. - A pf that supports the feature sends the request to the vf through a pre-defined field in the output mailbox. - In case vf does not support the feature, the pf will fail the control command (in this case, IP link tool command to set the vf vlan protocol to 802.1ad). No change in VST 802.1Q mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net: Update API for VF vlan protocol 802.1ad support	Moshe Shemesh	2016-09-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce new rtnl UAPI that exposes a list of vlans per VF, giving the ability for user-space application to specify it for the VF, as an option to support 802.1ad. We adjusted IP Link tool to support this option. For future use cases, the new UAPI supports multiple vlans. For now we limit the list size to a single vlan in kernel. Add IFLA_VF_VLAN_LIST in addition to IFLA_VF_VLAN to keep backward compatibility with older versions of IP Link tool. Add a vlan protocol parameter to the ndo_set_vf_vlan callback. We kept 802.1Q as the drivers' default vlan protocol. Suitable ip link tool command examples: Set vf vlan protocol 802.1ad: ip link set eth0 vf 1 vlan 100 proto 802.1ad Set vf to VST (802.1Q) mode: ip link set eth0 vf 1 vlan 100 proto 802.1Q Or by omitting the new parameter ip link set eth0 vf 1 vlan 100 Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: Disable vlan HW acceleration when in VF vlan protocol 802.1ad mode	Moshe Shemesh	2016-09-24	1	-0/+13
\| \| \| \| \| \| \| \| \|	In Ethernet VF, disable vlan HW acceleration on VF while it is set to VF vlan protocol 802.1ad mode. Signed-off-by: Moshe Shemesh <moshe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-09-13	1	-13/+8
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: drivers/net/ethernet/mediatek/mtk_eth_soc.c drivers/net/ethernet/qlogic/qed/qed_dcbx.c drivers/net/phy/Kconfig All conflicts were cases of overlapping commits. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Fixes for DCBX	Tariq Toukan	2016-09-12	1	-13/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds a capability check before enabling DCBX. In addition, it re-organizes the relevant data structures, and fixes a typo in a define. Fixes: af7d51852631 ("net/mlx4_en: Add DCB PFC support through CEE netlink commands") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: protect ring->xdp_prog with rcu_read_lock	Brenden Blanco	2016-09-06	1	-3/+10
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Depending on the preempt mode, the bpf_prog stored in xdp_prog may be freed despite the use of call_rcu inside bpf_prog_put. The situation is possible when running in PREEMPT_RCU=y mode, for instance, since the rcu callback for destroying the bpf prog can run even during the bh handling in the mlx4 rx path. Several options were considered before this patch was settled on: Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all of the rings are updated with the new program. This approach has the disadvantage that as the number of rings increases, the speed of update will slow down significantly due to napi_synchronize's msleep(1). Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh. The action of the bpf_prog_put_bh would be to then call bpf_prog_put later. Those drivers that consume a bpf prog in a bh context (like mlx4) would then use the bpf_prog_put_bh instead when the ring is up. This has the problem of complexity, in maintaining proper refcnts and rcu lists, and would likely be harder to review. In addition, this approach to freeing must be exclusive with other frees of the bpf prog, for instance a _bh prog must not be referenced from a prog array that is consumed by a non-_bh prog. The placement of rcu_read_lock in this patch is functionally the same as putting an rcu_read_lock in napi_poll. Actually doing so could be a potentially controversial change, but would bring the implementation in line with sk_busy_loop (though of course the nature of those two paths is substantially different), and would also avoid future copy/paste problems with future supporters of XDP. Still, this patch does not take that opinionated option. Testing was done with kernels in either PREEMPT_RCU=y or CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did not show up in the perf report whatsoever, and with PREEMPT_RCU=y the overhead of rcu_read_lock (according to perf) was the same before/after. In the rx path, rcu_read_lock is eventually called for every packet from netif_receive_skb_internal, so the napi poll call's rcu_read_lock is easily amortized. v2: Remove extra rcu_read_lock in mlx4_en_process_rx_cq body Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or rcu_dereference[_protected] as appropriate. Add explicit mutex lock around rcu_assign instead of xchg loop. Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-07-24	1	-13/+97
\|\ \| \| \| \| \| \| \| \| \| \|	Just several instances of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Add resilience in low memory systems	Eugenia Emantayev	2016-07-20	1	-13/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the lost of Ethernet port on low memory system, when driver frees its resources and fails to allocate new resources. Issue could happen while changing number of channels, rings size or changing the timestamp configuration. This fix is necessary because of removing vmap use in the code. When vmap was in use driver could allocate non-contiguous memory and make it contiguous with vmap. Now it could fail to allocate a large chunk of contiguous memory and lose the port. Current code tries to allocate new resources and then upon success frees the old resources. Fixes: 73898db04301 ('net/mlx4: Avoid wrong virtual mappings') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Move filters cleanup to a proper location	Eugenia Emantayev	2016-07-20	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Filters cleanup should be done once before destroying net device, since filters list is contained in the private data. Fixes: 1eb8c695bda9 ('net/mlx4_en: Add accelerated RFS support') Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: add xdp forwarding and data write support	Brenden Blanco	2016-07-20	1	-0/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A user will now be able to loop packets back out of the same port using a bpf program attached to xdp hook. Updates to the packet contents from the bpf program is also supported. For the packet write feature to work, the rx buffers are now mapped as bidirectional when the page is allocated. This occurs only when the xdp hook is active. When the program returns a TX action, enqueue the packet directly to a dedicated tx ring, so as to avoid completely any locking. This requires the tx ring to be allocated 1:1 for each rx ring, as well as the tx completion running in the same softirq. Upon tx completion, this dedicated tx ring recycles pages without unmapping directly back to the original rx ring. In steady state tx/drop workload, effectively 0 page allocs/frees will occur. In order to separate out the paths between free and recycle, a free_tx_desc func pointer is introduced that is optionally updated whenever recycle_ring is activated. By default the original free function is always initialized. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: add page recycle to prepare rx ring for tx support	Brenden Blanco	2016-07-20	1	-1/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mlx4 driver by default allocates order-3 pages for the ring to consume in multiple fragments. When the device has an xdp program, this behavior will prevent tx actions since the page must be re-mapped in TODEVICE mode, which cannot be done if the page is still shared. Start by making the allocator configurable based on whether xdp is running, such that order-0 pages are always used and never shared. Since this will stress the page allocator, add a simple page cache to each rx ring. Pages in the cache are left dma-mapped, and in drop-only stress tests the page allocator is eliminated from the perf report. Note that setting an xdp program will now require the rings to be reconfigured. Before: 26.91% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq 17.88% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags 6.00% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag 4.49% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist 3.21% swapper [kernel.vmlinux] [k] intel_idle 2.73% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem 2.57% swapper [mlx4_en] [k] mlx4_en_process_rx_cq After: 31.72% swapper [kernel.vmlinux] [k] intel_idle 8.79% swapper [mlx4_en] [k] mlx4_en_process_rx_cq 7.54% swapper [kernel.vmlinux] [k] poll_idle 6.36% swapper [mlx4_core] [k] mlx4_eq_int 4.21% swapper [kernel.vmlinux] [k] tasklet_action 4.03% swapper [kernel.vmlinux] [k] cpuidle_enter_state 3.43% swapper [mlx4_en] [k] mlx4_en_prepare_rx_desc 2.18% swapper [kernel.vmlinux] [k] native_irq_return_iret 1.37% swapper [kernel.vmlinux] [k] menu_select 1.09% swapper [kernel.vmlinux] [k] bpf_map_lookup_elem Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: add support for fast rx drop bpf program	Brenden Blanco	2016-07-20	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for the BPF_PROG_TYPE_XDP hook in mlx4 driver. In tc/socket bpf programs, helpers linearize skb fragments as needed when the program touches the packet data. However, in the pursuit of speed, XDP programs will not be allowed to use these slower functions, especially if it involves allocating an skb. Therefore, disallow MTU settings that would produce a multi-fragment packet that XDP programs would fail to access. Future enhancements could be done to increase the allowable MTU. The xdp program is present as a per-ring data structure, but as of yet it is not possible to set at that granularity through any ndo. Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller	2016-06-30	1	-12/+34
\|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Several cases of overlapping changes, except the packet scheduler conflicts which deal with the addition of the free list parameter to qdisc_enqueue(). Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Avoid unregister_netdev at shutdown flow	Eran Ben Elisha	2016-06-22	1	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This allows a clean shutdown, even if some netdev clients do not release their reference from this netdev. It is enough to release the HW resources only as the kernel is shutting down. Fixes: 2ba5fbd62b25 ('net/mlx4_core: Handle AER flow properly') Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	net/mlx4_en: Fix the return value of a failure in VLAN VID add/kill	Kamal Heib	2016-06-22	1	-7/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Modify mlx4_en_vlan_rx_[add/kill]_vid to return error value in case of failure. Fixes: 8e586137e6b6 ('net: make vlan ndo_vlan_rx_[add/kill]_vid return error value') Signed-off-by: Kamal Heib <kamalh@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| *	mlx4e: Do not attempt to offload VXLAN ports that are unrecognized	Alexander Duyck	2016-06-16	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mlx4e driver does not support more than one port for VXLAN offload. As such expecting the hardware to offload other ports is invalid since it appears the parsing logic is used to perform Tx checksum and segmentation offloads. Use the vxlan_port number to determine in which cases we can apply the offload and in which cases we can not. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: Add DCB PFC support through CEE netlink commands	Rana Shahout	2016-06-23	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for reading and updating priority flow control (PFC) attributes in the driver via netlink. Signed-off-by: Rana Shahout <ranas@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	mlx4_en: Replace ndo_add/del_vxlan_port with ndo_add/del_udp_enc_port	Alexander Duyck	2016-06-18	1	-21/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change replaces the network device operations for adding or removing a VXLAN port with operations that are more generically defined to be used for any UDP offload port but provide a type. As such by just adding a line to verify that the offload type is VXLAN we can maintain the same functionality. In addition I updated the socket address family check so that instead of excluding IPv6 we instead abort of type is not IPv4. This makes much more sense as we should only be supporting IPv4 outer addresses on this hardware. Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net/mlx4_en: mlx4_en_netpoll() should schedule TX, not RX	Eric Dumazet	2016-06-10	1	-2/+2
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I am not sure mlx4_en_netpoll() is doing anything useful right now. mlx4 has different NAPI structures for RX and TX, and netpoll only wants to drain TX queues. Lets schedule NAPI polls on TX, not RX. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: get rid of private net_device_stats	Eric Dumazet	2016-05-26	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	We simply can use the standard net_device stats. We do not need to clear fields that are already 0. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: get rid of ret_stats	Eric Dumazet	2016-05-26	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mlx4 uses a private struct net_device_stats in a vain attempt to avoid races. This is buggy because multiple cpus could call mlx4_en_get_stats() at the same time, so ret_stats can not guarantee stable results. To fix this, we need to switch to ndo_get_stats64() as this method provides per-thread storage. This allows to reduce mlx4_en_priv bloat. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: clear some TX ring stats in mlx4_en_clear_stats()	Eric Dumazet	2016-05-26	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	mlx4_en_clear_stats() clears about everything but few TX ring fields are missing : - queue_stopped, wake_queue, tso_packets, xmit_more Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4_en: fix tx_dropped bug	Eric Dumazet	2016-05-26	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1) mlx4_en_xmit() can increment priv->stats.tx_dropped, but this variable is overwritten in mlx4_en_DUMP_ETH_STATS(). 2) This increment was not SMP safe, as a port might have many TX queues. Add a per TX ring tx_dropped to fix these issues. This is u32 as mlx4_en_DUMP_ETH_STATS() will add a 32bit field. So lets avoid bugs with SNMP agents having to cope with partial overwraps. (One of these agents being bond_fold_stats()) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Willem de Bruijn <willemb@google.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	net/mlx4: Avoid wrong virtual mappings	Haggai Abramovsky	2016-05-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dma_alloc_coherent() function returns a virtual address which can be used for coherent access to the underlying memory. On some architectures, like arm64, undefined behavior results if this memory is also accessed via virtual mappings that are not coherent. Because of their undefined nature, operations like virt_to_page() return garbage when passed virtual addresses obtained from dma_alloc_coherent(). Any subsequent mappings via vmap() of the garbage page values are unusable and result in bad things like bus errors (synchronous aborts in ARM64 speak). The mlx4 driver contains code that does the equivalent of: vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the device is opened. Prevent Ethernet driver to run this problematic code by forcing it to allocate contiguous memory. As for the Infiniband driver, at first we are trying to allocate contiguous memory, but in case of failure roll back to work with fragmented memory. Signed-off-by: Haggai Abramovsky <hagaya@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reported-by: David Daney <david.daney@cavium.com> Tested-by: Sinan Kaya <okaya@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>