summaryrefslogtreecommitdiffstats
path: root/drivers/net/macvlan.c
Commit message (Collapse)AuthorAgeFilesLines
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-06-121-1/+0Star
|\ | | | | | | | | | | | | | | | | | | Conflicts: net/core/rtnetlink.c net/core/skbuff.c Both conflicts were very simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: force a list_del() in unregister_netdevice_many()Eric Dumazet2014-06-081-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | unregister_netdevice_many() API is error prone and we had too many bugs because of dangling LIST_HEAD on stacks. See commit f87e6f47933e3e ("net: dont leave active on stack LIST_HEAD") In fact, instead of making sure no caller leaves an active list_head, just force a list_del() in the callee. No one seems to need to access the list after unregister_netdevice_many() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Support bonding eventsVlad Yasevich2014-06-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bonding and team drivers generate specific events during failover that trigger switch updates. When a macvlan device is configured on top of bonding, we want switches to learn about the macvlan devices as well. This patch adds a handler to macvlan driver to propagate these events to all macvlan devices. We let the generic inetdev event handler do the work. This allows macvlan to operated correctly over active-backup mode bond. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: add netpoll supportdingtianhong2014-06-031-1/+65
| | | | | | | | | | | | | | | | | | Add netpoll support to macvlan devices. Based on the netpoll support in the 802.1q vlan code. Tested and macvlan could work well with netconsole. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: fix the problem when mac address changes for passthru modedingtianhong2014-06-031-12/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macvlan dev should always have the same mac address like lowerdev when in the passthru mode, change the mac address alone will break the work mechanism, so when the lowerdev or macvlan mac address changes, we should propagate the changes to another dev. v1->v2: Allow macvlan dev to change mac address for passthru mode and propagate to lowerdev. v2->v3: Don't set the mac address to the lower dev's unicast address for passthru mode when mac address changes. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-05-241-4/+14
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: Fix lockdep warnings with stacked macvlan devicesVlad Yasevich2014-05-171-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Macvlan devices try to avoid stacking, but that's not always successfull or even desired. As an example, the following configuration is perefectly legal and valid: eth0 <--- macvlan0 <---- vlan0.10 <--- macvlan1 However, this configuration produces the following lockdep trace: [ 115.620418] ====================================================== [ 115.620477] [ INFO: possible circular locking dependency detected ] [ 115.620516] 3.15.0-rc1+ #24 Not tainted [ 115.620540] ------------------------------------------------------- [ 115.620577] ip/1704 is trying to acquire lock: [ 115.620604] (&vlan_netdev_addr_lock_key/1){+.....}, at: [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.620686] but task is already holding lock: [ 115.620723] (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.620795] which lock already depends on the new lock. [ 115.620853] the existing dependency chain (in reverse order) is: [ 115.620894] -> #1 (&macvlan_netdev_addr_lock_key){+.....}: [ 115.620935] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.620974] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621019] [<ffffffffa07296c3>] vlan_dev_set_rx_mode+0x53/0x110 [8021q] [ 115.621066] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621105] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621143] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] -> #0 (&vlan_netdev_addr_lock_key/1){+.....}: [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b [ 115.621174] other info that might help us debug this: [ 115.621174] Possible unsafe locking scenario: [ 115.621174] CPU0 CPU1 [ 115.621174] ---- ---- [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] lock(&macvlan_netdev_addr_lock_key); [ 115.621174] lock(&vlan_netdev_addr_lock_key/1); [ 115.621174] *** DEADLOCK *** [ 115.621174] 2 locks held by ip/1704: [ 115.621174] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff815e6dbb>] rtnetlink_rcv+0x1b/0x40 [ 115.621174] #1: (&macvlan_netdev_addr_lock_key){+.....}, at: [<ffffffff815da5be>] dev_set_rx_mode+0x1e/0x40 [ 115.621174] stack backtrace: [ 115.621174] CPU: 3 PID: 1704 Comm: ip Not tainted 3.15.0-rc1+ #24 [ 115.621174] Hardware name: Hewlett-Packard HP xw8400 Workstation/0A08h, BIOS 786D5 v02.38 10/25/2010 [ 115.621174] ffffffff82339ae0 ffff880465f79568 ffffffff816ee20c ffffffff82339ae0 [ 115.621174] ffff880465f795a8 ffffffff816e9e1b ffff880465f79600 ffff880465b019c8 [ 115.621174] 0000000000000001 0000000000000002 ffff880465b019c8 ffff880465b01230 [ 115.621174] Call Trace: [ 115.621174] [<ffffffff816ee20c>] dump_stack+0x4d/0x66 [ 115.621174] [<ffffffff816e9e1b>] print_circular_bug+0x200/0x20e [ 115.621174] [<ffffffff810d4d43>] __lock_acquire+0x1773/0x1a60 [ 115.621174] [<ffffffff810d3172>] ? trace_hardirqs_on_caller+0xb2/0x1d0 [ 115.621174] [<ffffffff810d57f2>] lock_acquire+0xa2/0x130 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff816f62e7>] _raw_spin_lock_nested+0x37/0x50 [ 115.621174] [<ffffffff815df49c>] ? dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffff815df49c>] dev_uc_sync+0x3c/0x80 [ 115.621174] [<ffffffffa0696d2a>] macvlan_set_mac_lists+0xca/0x110 [macvlan] [ 115.621174] [<ffffffff815da557>] __dev_set_rx_mode+0x57/0xa0 [ 115.621174] [<ffffffff815da5c6>] dev_set_rx_mode+0x26/0x40 [ 115.621174] [<ffffffff815da6be>] __dev_open+0xde/0x140 [ 115.621174] [<ffffffff815da9ad>] __dev_change_flags+0x9d/0x170 [ 115.621174] [<ffffffff815daaa9>] dev_change_flags+0x29/0x60 [ 115.621174] [<ffffffff811e1db1>] ? mem_cgroup_bad_page_check+0x21/0x30 [ 115.621174] [<ffffffff815e7f11>] do_setlink+0x321/0x9a0 [ 115.621174] [<ffffffff810d394c>] ? __lock_acquire+0x37c/0x1a60 [ 115.621174] [<ffffffff815ea59f>] rtnl_newlink+0x51f/0x730 [ 115.621174] [<ffffffff815ea169>] ? rtnl_newlink+0xe9/0x730 [ 115.621174] [<ffffffff815e6e75>] rtnetlink_rcv_msg+0x95/0x250 [ 115.621174] [<ffffffff810d329d>] ? trace_hardirqs_on+0xd/0x10 [ 115.621174] [<ffffffff815e6dbb>] ? rtnetlink_rcv+0x1b/0x40 [ 115.621174] [<ffffffff815e6de0>] ? rtnetlink_rcv+0x40/0x40 [ 115.621174] [<ffffffff81608b19>] netlink_rcv_skb+0xa9/0xc0 [ 115.621174] [<ffffffff815e6dca>] rtnetlink_rcv+0x2a/0x40 [ 115.621174] [<ffffffff81608150>] netlink_unicast+0xf0/0x1c0 [ 115.621174] [<ffffffff8160851f>] netlink_sendmsg+0x2ff/0x740 [ 115.621174] [<ffffffff815bc9db>] sock_sendmsg+0x8b/0xc0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff8119d4f8>] ? might_fault+0xa8/0xb0 [ 115.621174] [<ffffffff8119d4af>] ? might_fault+0x5f/0xb0 [ 115.621174] [<ffffffff815cb51e>] ? verify_iovec+0x5e/0xe0 [ 115.621174] [<ffffffff815bd4b9>] ___sys_sendmsg+0x369/0x380 [ 115.621174] [<ffffffff816faa0d>] ? __do_page_fault+0x11d/0x570 [ 115.621174] [<ffffffff810cfe9f>] ? up_read+0x1f/0x40 [ 115.621174] [<ffffffff816fab04>] ? __do_page_fault+0x214/0x570 [ 115.621174] [<ffffffff8120a10b>] ? mntput_no_expire+0x6b/0x1c0 [ 115.621174] [<ffffffff8120a0b7>] ? mntput_no_expire+0x17/0x1c0 [ 115.621174] [<ffffffff8120a284>] ? mntput+0x24/0x40 [ 115.621174] [<ffffffff815bdbb2>] __sys_sendmsg+0x42/0x80 [ 115.621174] [<ffffffff815bdc02>] SyS_sendmsg+0x12/0x20 [ 115.621174] [<ffffffff816ffd69>] system_call_fastpath+0x16/0x1b Fix this by correctly providing macvlan lockdep class. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: Don't propagate IFF_ALLMULTI changes on down interfaces.Peter Christensen2014-05-121-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Clearing the IFF_ALLMULTI flag on a down interface could cause an allmulti overflow on the underlying interface. Attempting the set IFF_ALLMULTI on the underlying interface would cause an error and the log message: "allmulti touches root, set allmulti failed." Signed-off-by: Peter Christensen <pch@ordbogen.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: simplify the structure portdingtianhong2014-05-161-7/+5Star
| | | | | | | | | | | | | | | | | | | | The port->count was used to count the number of macvlan devs in the same port, but the list vlans could play the same role to do that, so free the port if the list vlans is empty and no need to use the parameter count. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Propagate lowerdev MTU changesdingtianhong2014-05-141-0/+7
| | | | | | | | | | | | | | | | | | | | When the physical MTU changes we should ensure that all existing MACVLAN dev MTU do not exceed the new lowerdev MTU. This patch adds that propagation. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-05-121-3/+0Star
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Revert "macvlan : fix checksums error when we are in bridge mode"Vlad Yasevich2014-04-301-3/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 12a2856b604476c27d85a5f9a57ae1661fc46019. The commit above doesn't appear to be necessary any more as the checksums appear to be correctly computed/validated. Additionally the above commit breaks kvm configurations where one VM is using a device that support checksum offload (virtio) and the other VM does not. In this case, packets leaving virtio device will have CHECKSUM_PARTIAL set. The packets is forwarded to a macvtap that has offload features turned off. Since we use CHECKSUM_UNNECESSARY, the host does does not update the checksum and thus a bad checksum is passed up to the guest. CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Patrick McHardy <kaber@trash.net> CC: Andrian Nord <nightnord@gmail.com> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Michael S. Tsirkin <mst@redhat.com> CC: Jason Wang <jasowang@redhat.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Fix leak and NULL dereference on error pathHerbert Xu2014-04-231-4/+7
| | | | | | | | | | | | | | | | | | | | The recent patch that moved broadcasts to process context added a couple of bugs on the error path where we may dereference NULL or leak an skb. This patch fixes them. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Move broadcasts into a work queueHerbert Xu2014-04-211-23/+93
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently broadcasts are handled in network RX context, where the packets are sent through netif_rx. This means that the number of macvlans will be constrained by the capacity of netif_rx. For example, setting up 4096 macvlans practically causes all broadcast packets to be dropped as the default netif_rx queue size simply can't handle 4096 skbs being stuffed into it all at once. Fundamentally, we need to ensure that the amount of work handled in each netif_rx backlog run is constrained. As broadcasts are anything but constrained, it either needs to be limited per run or moved to process context. This patch picks the second option and moves all broadcast handling bar the trivial case of packets going to a single interface into a work queue. Obviously there also needs to be a limit on how many broadcast packets we postpone in this way. I've arbitrarily chosen tx_queue_len of the master device as the limit (act_mirred also happens to use this parameter in a similar way). In order to ensure we don't exceed the backlog queue we will use netif_rx_ni instead of netif_rx for broadcast packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Replace u64_stats_fetch_begin_bh to u64_stats_fetch_begin_irqEric W. Biederman2014-03-151-2/+2
| | | | | | | | | | | | | | | Replace the bh safe variant with the hard irq safe variant. We need a hard irq safe variant to deal with netpoll transmitting packets from hard irq context, and we need it in most if not all of the places using the bh safe variant. Except on 32bit uni-processor the code is exactly the same so don't bother with a bh variant, just have a hard irq safe variant that everyone can use. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-03-061-2/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/ath/ath9k/recv.c drivers/net/wireless/mwifiex/pcie.c net/ipv6/sit.c The SIT driver conflict consists of a bug fix being done by hand in 'net' (missing u64_stats_init()) whilst in 'net-next' a helper was created (netdev_alloc_pcpu_stats()) which takes care of this. The two wireless conflicts were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: Add support for 'always_on' offload featuresVlad Yasevich2014-03-031-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Macvlan currently inherits all of its features from the lower device. When lower device disables offload support, this causes macvlan to disable offload support as well. This causes performance regression when using macvlan/macvtap in bridge mode. It can be easily demonstrated by creating 2 namespaces using macvlan in bridge mode and running netperf between them: MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.00 1204.61 To restore the performance, we add software offload features to the list of "always_on" features for macvlan. This way when a namespace or a guest using macvtap initially sends a packet, this packet will not be segmented at macvlan level. It will only be segmented when macvlan sends the packet to the lower device. MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 87380 16384 16384 20.00 5507.35 Fixes: 6acf54f1cf0a6747bac9fea26f34cfc5a9029523 (macvtap: Add support of packet capture on macvtap device.) Fixes: 797f87f83b60685ff8a13fa0572d2f10393c50d3 (macvlan: fix netdev feature propagation from lower device) CC: Florian Westphal <fw@strlen.de> CC: Christian Borntraeger <borntraeger@de.ibm.com> CC: Jason Wang <jasowang@redhat.com> CC: Michael S. Tsirkin <mst@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-02-191-2/+3
|\| | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/bonding/bond_3ad.h drivers/net/bonding/bond_main.c Two minor conflicts in bonding, both of which were overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: unregister net device when netdev_upper_dev_link() failsCong Wang2014-02-131-2/+3
| | | | | | | | | | | | | | | | | | | | rtnl_newlink() doesn't unregister it for us on failure. Cc: Patrick McHardy <kaber@trash.net> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: introduce netdev_alloc_pcpu_stats() for driversWANG Cong2014-02-141-8/+1Star
|/ | | | | | | | | There are many drivers calling alloc_percpu() to allocate pcpu stats and then initializing ->syncp. So just introduce a helper function for them. Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-01-141-7/+7
|\
| * net: core: explicitly select a txq before doing l2 forwardingJason Wang2014-01-101-6/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The will cause several issues: - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan instead of lower device which misses the necessary txq synchronization for lower device such as txq stopping or frozen required by dev watchdog or control path. - dev_hard_start_xmit() was called with NULL txq which bypasses the net device watchdog. - dev_hard_start_xmit() does not check txq everywhere which will lead a crash when tso is disabled for lower device. Fix this by explicitly introducing a new param for .ndo_select_queue() for just selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also extended to accept this parameter and dev_queue_xmit_accel() was used to do l2 forwarding transmission. With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of dev_queue_xmit() to do the transmission. In the future, it was also required for macvtap l2 forwarding support since it provides a necessary synchronization method. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: e1000-devel@lists.sourceforge.net Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: forbid L2 fowarding offload for macvtapJason Wang2014-01-101-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | L2 fowarding offload will bypass the rx handler of real device. This will make the packet could not be forwarded to macvtap device. Another problem is the dev_hard_start_xmit() called for macvtap does not have any synchronization. Fix this by forbidding L2 forwarding for macvtap. Cc: John Fastabend <john.r.fastabend@intel.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-01-061-3/+13
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/qlogic/qlcnic/qlcnic_sriov_pf.c net/ipv6/ip6_tunnel.c net/ipv6/ip6_vti.c ipv6 tunnel statistic bug fixes conflicting with consolidation into generic sw per-cpu net stats. qlogic conflict between queue counting bug fix and the addition of multiple MAC address support. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: fix netdev feature propagation from lower deviceFlorian Westphal2013-12-261-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are inconsistencies wrt. feature propagation/inheritance between macvlan and the underlying interface. When a feature is turned off on the real device before a macvlan is created on top, these will remain enabled on the macvlan device, whereas turning off the feature on the lower device after macvlan creation the kernel will propagate the changes to the macvlan. The second issue is that, when propagating changes from underlying device to the macvlan interface, macvlan can erronously lose its NETIF_F_LLTX flag, as features are anded with the underlying device. However, LLTX should be kept since it has no dependencies on physical hardware (LLTX is set on macvlan creation regardless of the lower device properties, see 8ffab51b3dfc54876f145f15b351c41f3f703195 (macvlan: lockless tx path). The LLTX flag is now forced regardless of user settings in absence of layer2 hw acceleration (a6cc0cfa72e0b6d9f2c8fd858aa, net: Add layer 2 hardware acceleration operations for macvlan devices). Use netdev_increment_features to rebuild the feature set on capability changes on either the lower device or on the macvlan interface. As pointed out by Ben Hutchings, use netdev_update_features on NETDEV_FEAT_CHANGE event (it calls macvlan_fix_features/netdev_features_change if needed). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: unify macvlan_pcpu_stats and vlan_pcpu_statsLi RongQing2014-01-051-4/+4
| | | | | | | | | | | | | | | | They are same, so unify them as one; since macvlan is a kind of vlan, vlan_pcpu_stats should be a proper name for vlan and macvlan. Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: make start_xmit localstephen hemminger2013-12-281-3/+2Star
| | | | | | | | | | | | | | Only used in one file, no need to expose Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Remove custom recieve and forward handlersVlad Yasevich2013-12-121-12/+5Star
| | | | | | | | | | | | | | | | | | Since now macvlan and macvtap use the same receive and forward handlers, we can remove them completely and use netif_rx and dev_forward_skb() directly. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: Support creating macvtaps from macvlansKevin Wallace2013-12-061-5/+3Star
|/ | | | | | | | | | | | | | | | | When running in a network namespace whose only link to the outside world is a macvlan device, not being able to create a macvtap off of it is a real pain. So modify macvtap creation to automatically forward a creation of a macvtap on a macvlan to become a creation of a macvtap on the underlying network device, just like is currently done with macvlan-on-macvlan devices. v2: Use netif_is_macvlan and macvlan_dev_real_dev helpers to make it more clear what we're doing. Signed-off-by: Kevin Wallace <kevin@pentabarf.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'core-locking-for-linus' of ↵Linus Torvalds2013-11-141-0/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core locking changes from Ingo Molnar: "The biggest changes: - add lockdep support for seqcount/seqlocks structures, this unearthed both bugs and required extra annotation. - move the various kernel locking primitives to the new kernel/locking/ directory" * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits) block: Use u64_stats_init() to initialize seqcounts locking/lockdep: Mark __lockdep_count_forward_deps() as static lockdep/proc: Fix lock-time avg computation locking/doc: Update references to kernel/mutex.c ipv6: Fix possible ipv6 seqlock deadlock cpuset: Fix potential deadlock w/ set_mems_allowed seqcount: Add lockdep functionality to seqcount/seqlock structures net: Explicitly initialize u64_stats_sync structures for lockdep locking: Move the percpu-rwsem code to kernel/locking/ locking: Move the lglocks code to kernel/locking/ locking: Move the rwsem code to kernel/locking/ locking: Move the rtmutex code to kernel/locking/ locking: Move the semaphore core to kernel/locking/ locking: Move the spinlock code to kernel/locking/ locking: Move the lockdep code to kernel/locking/ locking: Move the mutex code to kernel/locking/ hung_task debugging: Add tracepoint to report the hang x86/locking/kconfig: Update paravirt spinlock Kconfig description lockstat: Report avg wait and hold times lockdep, x86/alternatives: Drop ancient lockdep fixup message ...
| * net: Explicitly initialize u64_stats_sync structures for lockdepJohn Stultz2013-11-061-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to enable lockdep on seqcount/seqlock structures, we must explicitly initialize any locks. The u64_stats_sync structure, uses a seqcount, and thus we need to introduce a u64_stats_init() function and use it to initialize the structure. This unfortunately adds a lot of fairly trivial initialization code to a number of drivers. But the benefit of ensuring correctness makes this worth while. Because these changes are required for lockdep to be enabled, and the changes are quite trivial, I've not yet split this patch out into 30-some separate patches, as I figured it would be better to get the various maintainers thoughts on how to best merge this change along with the seqcount lockdep enablement. Feedback would be appreciated! Signed-off-by: John Stultz <john.stultz@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Cc: Jesse Gross <jesse@nicira.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mirko Lindner <mlindner@marvell.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Roger Luethi <rl@hellgate.ch> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Simon Horman <horms@verge.net.au> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Wensong Zhang <wensong@linux-vs.org> Cc: netdev@vger.kernel.org Link: http://lkml.kernel.org/r/1381186321-4906-2-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | net: Add layer 2 hardware acceleration operations for macvlan devicesJohn Fastabend2013-11-081-1/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a operations structure that allows a network interface to export the fact that it supports package forwarding in hardware between physical interfaces and other mac layer devices assigned to it (such as macvlans). This operaions structure can be used by virtual mac devices to bypass software switching so that forwarding can be done in hardware more efficiently. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan: resolve ENOENT errors on creationJohn Fastabend2013-10-231-6/+5Star
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the commit below attempting to create macvlan devices was resulting in ENOENT errors, # ip link add link p3p2 type macvlan RTNETLINK answers: Invalid argument This happens because netdev_upper_dev_link() is called before register_netdevice() in the macvlan code. Through a call chain this results in a call to __netdev_adjacent_dev_insert() and finally a sysfs_create_link(). This requires the kobject of the macvlan to be registered which is done in register_netdevice(). If there is no kobject which is the case here the ENOENT error is seen on the command line. To resolve this move the netdev_upper_dev_link() call below the register_netdevice() call. This aligns with vlan driver flow. Regression introduced here, commit 5831d66e8097aedfa3bc35941cf265ada2352317 Author: Veaceslav Falico <vfalico@redhat.com> Date: Wed Sep 25 09:20:32 2013 +0200 net: create sysfs symlinks for neighbour devices CC: Veaceslav Falico <vfalico@redhat.com> CC: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Veaceslav Falico <vfalico@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: Move skb_clone check closer to callHerbert Xu2013-09-111-4/+6
| | | | | | | | | | | | | | | | | Currently macvlan calls skb_clone in macvlan_broadcast but checks for a NULL return in macvlan_broadcast_one instead. This is needlessly confusing and may lead to bugs introduced later. This patch moves the error check to where the skb_clone call is. The only other caller of macvlan_broadcast_one never passes in a NULL value so it doesn't need the check either. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Reviewed-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: macvlan: inherit addr_assign_type along with dev_addrBjørn Mork2013-09-041-1/+1
| | | | | | | | | A device inheriting a random or set address should reflect this in its addr_assign_type. Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: fix typo in assignmentLutz Jaenicke2013-08-301-1/+1
| | | | | | | | | | | commit 3b04ddde02cf1b6f14f2697da5c20eca5715017f "[NET]: Move hardware header operations out of netdevice." moved the handling into macvlan setup adding dev->header_ops = &macvlan_hard_header_ops, At the end of the line the ',' should have been a ';' Signed-off-by: Lutz Jaenicke <ljaenicke@innominate.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-171-0/+4
|\
| * macvlan: validate flagsMichael S. Tsirkin2013-08-051-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit df8ef8f3aaa6692970a436204c4429210addb23a macvlan: add FDB bridge ops and macvlan flags added a flags field to macvlan, which can be controlled from userspace. The idea is to make the interface future-proof so we can add flags and not new fields. However, flags value isn't validated, as a result, userspace can't detect which flags are supported. Cc: "David S. Miller" <davem@davemloft.net> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-041-4/+19
|\| | | | | | | | | | | | | Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: handle set_promiscuity failuresMichael S. Tsirkin2013-08-021-2/+5
| | | | | | | | | | | | | | | | | | It's quite unlikely that dev_set_promiscuity will fail, but worth checking just in case. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: better mode validationMichael S. Tsirkin2013-08-021-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | macvlan passthrough mode is special: it's not possible to switch to or from it through a netlink command. But if you try, the command will succeed, which is confusing. Validate input and return error to user. Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | macvlan fdb replace supportThomas Richter2013-07-241-0/+3
|/ | | | | | | | | | | | Add support for iproute2 command 'bridge fdb replace ...'. The rtnletlink call back function ndo_fdb_add will be called with the NLM_F_REPLACE flag set. Simply return -EOPNOTSUP. Resubmitted because net-next was closed last week. Signed-off-by: Thomas Richter <tmricht@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvtap: Let TUNSETOFFLOAD actually controll offload features.Vlad Yasevich2013-06-261-0/+10
| | | | | | | | | | | When the user issues TUNSETOFFLOAD ioctl, macvtap does not do anything other then to verify arguments. This patch adds functionality to allow users to actually control offload features. NETIF_F_GSO and NETIF_F_GRO are always on, but the rest of the features can be controlled. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-06-201-7/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/ath/ath9k/Kconfig drivers/net/xen-netback/netback.c net/batman-adv/bat_iv_ogm.c net/wireless/nl80211.c The ath9k Kconfig conflict was a change of a Kconfig option name right next to the deletion of another option. The xen-netback conflict was overlapping changes involving the handling of the notify list in xen_netbk_rx_action(). Batman conflict resolution provided by Antonio Quartulli, basically keep everything in both conflict hunks. The nl80211 conflict is a little more involved. In 'net' we added a dynamic memory allocation to nl80211_dump_wiphy() to fix a race that Linus reported. Meanwhile in 'net-next' the handlers were converted to use pre and post doit handlers which use a flag to determine whether to hold the RTNL mutex around the operation. However, the dump handlers to not use this logic. Instead they have to explicitly do the locking. There were apparent bugs in the conversion of nl80211_dump_wiphy() in that we were not dropping the RTNL mutex in all the return paths, and it seems we very much should be doing so. So I fixed that whilst handling the overlapping changes. To simplify the initial returns, I take the RTNL mutex after we try to allocate 'tb'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * macvlan: don't touch promisc without passthroughMichael S. Tsirkin2013-06-131-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit df8ef8f3aaa6692970a436204c4429210addb23a "macvlan: add FDB bridge ops and macvlan flags" added a way to control NOPROMISC macvlan flag through netlink. However, with a non passthrough device we never set promisc on open, even if NOPROMISC is off. As a result: If userspace clears NOPROMISC on open, then does not clear it on a netlink command, promisc counter is not decremented on stop and there will be no way to clear it once macvlan is detached. If userspace does not clear NOPROMISC on open, then sets NOPROMISC on a netlink command, promisc counter will be decremented from 0 and overflow to fffffffff with no way to clear promisc. To fix, simply ignore NOPROMISC flag in a netlink command for non-passthrough devices, same as we do at open/close. Since we touch this code anyway - check dev_set_promiscuity return code and pass it to users (though an error here is unlikely). Cc: "David S. Miller" <davem@davemloft.net> Reviewed-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: pass info struct via netdevice notifierJiri Pirko2013-05-281-1/+1
|/ | | | | | | | | | | | | | So far, only net_device * could be passed along with netdevice notifier event. This patch provides a possibility to pass custom structure able to provide info that event listener needs to know. Signed-off-by: Jiri Pirko <jiri@resnulli.us> v2->v3: fix typo on simeth shortened dev_getter shortened notifier_info struct name v1->v2: fix notifier_call parameter in call_netdevice_notifier() Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: fix passthru mode race between dev removal and rx pathJiri Pirko2013-05-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if macvlan in passthru mode is created and data are rxed and you remove this device, following panic happens: NULL pointer dereference at 0000000000000198 IP: [<ffffffffa0196058>] macvlan_handle_frame+0x153/0x1f7 [macvlan] I'm using following script to trigger this: <script> while [ 1 ] do ip link add link e1 name macvtap0 type macvtap mode passthru ip link set e1 up ip link set macvtap0 up IFINDEX=`ip link |grep macvtap0 | cut -f 1 -d ':'` cat /dev/tap$IFINDEX >/dev/null & ip link del dev macvtap0 done </script> I run this script while "ping -f" is running on another machine to send packets to e1 rx. Reason of the panic is that list_first_entry() is blindly called in macvlan_handle_frame() even if the list was empty. vlan is set to incorrect pointer which leads to the crash. I'm fixing this by protecting port->vlans list by rcu and by preventing from getting incorrect pointer in case the list is empty. Introduced by: commit eb06acdc85585f2 "macvlan: Introduce 'passthru' mode to takeover the underlying device" Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: announce STAG offload capability in some driversPatrick McHardy2013-04-191-1/+1
| | | | | | | | | - macvlan: propagate STAG filtering capabilities from underlying device - ifb: announce STAG tagging support in addition to CTAG tagging support - veth: announce STAG tagging/stripping support in addition to CTAG support Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: prepare for 802.1ad VLAN filtering offloadPatrick McHardy2013-04-191-4/+4
| | | | | | | | | Change the rx_{add,kill}_vid callbacks to take a protocol argument in preparation of 802.1ad support. The protocol argument used so far is always htons(ETH_P_8021Q). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: vlan: rename NETIF_F_HW_VLAN_* feature flags to NETIF_F_HW_VLAN_CTAG_*Patrick McHardy2013-04-191-1/+1
| | | | | | | | | | Rename the hardware VLAN acceleration features to include "CTAG" to indicate that they only support CTAGs. Follow up patches will introduce 802.1ad server provider tagging (STAGs) and require the distinction for hardware not supporting acclerating both. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>