summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bridge: multicast: notify on group deleteNikolay Aleksandrov2015-07-201-0/+2
| | | | | | | | | Group notifications were not sent when a group expired or was deleted due to bridge/port device being deleted. So add br_mdb_notify() to br_multicast_del_pg(). Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'bpf_cgroup_classid'David S. Miller2015-07-204-21/+53
|\ | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== BPF update This small helper allows for accessing net_cls cgroups classid. Please see individual patches for more details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ebpf: add helper to retrieve net_cls's classid cookieDaniel Borkmann2015-07-202-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It would be very useful to retrieve the net_cls's classid from an eBPF program to allow for a more fine-grained classification, it could be directly used or in conjunction with additional policies. I.e. docker, but also tooling such as cgexec, can easily run applications via net_cls cgroups: cgcreate -g net_cls:/foo echo 42 > foo/net_cls.classid cgexec -g net_cls:foo <prog> Thus, their respecitve classid cookie of foo can then be looked up on the egress path to apply further policies. The helper is desigend such that a non-zero value returns the cgroup id. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * cls_cgroup: factor out classid retrievalDaniel Borkmann2015-07-202-21/+31
|/ | | | | | | | | | | | Split out retrieving the cgroups net_cls classid retrieval into its own function, so that it can be reused later on from other parts of the traffic control subsystem. If there's no skb->sk, then the small helper returns 0 as well, which in cls_cgroup terms means 'could not classify'. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* enic: allow adaptive coalesce setting for msi/legacy intrGovindarajulu Varadarajan2015-07-201-48/+65
| | | | | | | | | | | | | | | | | | | | * Allow setting of adaptive coalescing setting for all types of interrupt. * In msi & legacy intr, we use single interrupt for rx & tx. In this case tx_coalesce_usecs is invalid. We should use only rx_coalesce_usecs. Do not display tx_coal values for msi/intx. And do not allow user to set this as well. * Driver supports only tx/rx_coalesce_usec and adaptive coalesce settings. For other values, driver does not return error. So ethtool succeeds for unsupported values. Introduce enic_coalesce_valid() function to validate the coalescing values. * If user requests for coalesce value greater than what adaptor supports, driver uses the max value. We should at least log this. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* enic: add adaptive coalescing intr for intx and msi pollGovindarajulu Varadarajan2015-07-201-68/+67Star
| | | | | | | | | | | | | Adaptive interrupt coalescing is available for msix. This patch adds the support for msi poll. Interface for adaptive interrupt coalescing is already added in driver. We just did not enable it for legacy intr & msi. enic_calc_int_moderation() & enic_set_int_moderation() are defined as static after enic_poll. Since enic_poll needs it, move both of these function definitions above enic_poll. No change in functionality. Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Remove unused arguments for __ipv6_dev_get_saddr().YOSHIFUJI Hideaki2015-07-161-4/+2Star
| | | | | Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'protodown'David S. Miller2015-07-166-3/+77
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anuradha Karuppiah says: ==================== net: Introduce protodown flag. User space daemons can detect errors in the network that need to be notified to the switch device drivers. Drivers can react to this error state by doing a phy-down on the switch-port which would result in a carrier-off locally and on the directly connected switch. Doing that would prevent loops and black-holes in the network. One such use case is the multi-chassis LAG application - 1. The MLAG application runs on peer switches (say Switch0 and Switch1) synchronizing states, forwarding entries etc. between the two switches over the peer-link (this is a link directly connecting the two switches). 2. An MLAG election process designates one of the switches as a primary (for e.g. Switch0 is primary and Switch1 is secondary). 3. The peer link plays a critical role in allowing Switch0-Switch1 to function as a single LAG partner to the downstream dual-connected servers. When the peer-link between the switches goes down we have a split-brain situation. Switch0 and Switch1 are no longer in sync and are acting independently. This can result in traffic loops and traffic black-holing in the network. 4. To prevent these problems the MLAG application on the secondary switch phy-downs the MLAG ports on detecting the peer-link down. This will be seen as a carrier down on servers that are dual-connected to Switch0 and Switch1. 5. Specifically a dual-connected server will see a carrier-down on the port connected to the MLAG secondary, Switch1, and will stop using that port for traffic TX. So traffic black holing is prevented. v6 to v7: Removed some unnecessary code in response to review comments. v5 to v6: Replaced proto_flags with a simple proto_down boolean attribute in response to Dave's comments. v4 to v5: Changed the ip link display format for protodown to match the set as recommended by Stephen. v3 to v4: I have moved protodown out of IFF_XXX and introduced a separate proto_flags field with IF_PROTOF_DOWN bit being used by apps to notify switch port errors. This is in response to Stephen's comments that adding a new IFF_XXX may break user space. I have used rocker as the sample switch driver. And to test this functionality I used the qemu-rocker patch that Scott sent out in response to the v3 posting (needed to set link up/down when phy is enabled/disabled). v1 to v2: Based on Dave's suggestion I have moved out aggregating of error bits across applications to a user space framework. This patch now simply notifies an aggregated error bit to drivers enabling them to handle the error gracefully. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * rocker: Handle protodown notifications.Anuradha Karuppiah2015-07-161-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | protodown can be set by user space applications like MLAG on detecting errors on a switch port. This patch provides sample switch driver changes for handling protodown. Rocker PHYS disables the port in response to protodown. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netlink: changes for setting and clearing protodown via netlink.Anuradha Karuppiah2015-07-162-2/+15
| | | | | | | | | | | | | | | | Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net core: Add protodown support.Anuradha Karuppiah2015-07-163-0/+48
|/ | | | | | | | | | | | | | | This patch introduces the proto_down flag that can be used by user space applications to notify switch drivers that errors have been detected on the device. The switch driver can react to protodown notification by doing a phys down on the associated switch port. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com> Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ibmveth: add support for TSO6Thomas Falcon2015-07-162-28/+135
| | | | | | | | | This patch adds support for a new method of signalling the firmware that TSO packets are being sent. The new method removes the need to alter the ip and tcp checksums and allows TSO6 support. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* hv_netvsc: Add close of RNDIS filter into change mtu callHaiyang Zhang2015-07-161-6/+52
| | | | | | | | | | | | | | The current change mtu call only stops tx before removing RNDIS filter. In case ringbufer is not empty, the rndis_filter_device_remove() may hang on removing the buffers. This patch adds close of RNDIS filter before removing it, also a gradual waiting loop until the ring is empty. The change_mtu hang issue under heavy traffic is solved by this patch. Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Reviewed-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix finding best source address in ipv6_dev_get_saddr().YOSHIFUJI Hideaki/吉藤英明2015-07-161-12/+18
| | | | | | | | | | | | | | | | | | Commit 9131f3de2 ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") did not properly update best source address available. Plus, it introduced possible NULL pointer dereference. Bug was reported by Erik Kline <ek@google.com>. Based on patch proposed by Hajime Tazaki <thehajime@gmail.com>. Fixes: 9131f3de24db4dc12199aede7d931e6703e97f3b ("ipv6: Do not iterate over all interfaces when finding source address on specific interface.") Signed-off-by: YOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com> Acked-by: Hajime Tazaki <thehajime@gmail.com> Acked-by: Erik Kline <ek@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2015-07-1615-79/+155
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2015-07-14 This series contains updates to i40e and i40evf only. Joe Stringer and Jesse Gross add a ndo_features_check function to ensure that the i40e driver does not try to offload packets that exceed 80 bytes in length. Anjali adds additional stats to track flow director ATR and SB current state and flow director flush count which will help the need for verbose debug logs with respect to flow director. Also refines an error message to avoid confusion, so that it indicates what may have really happened when the init_shared_code() call possibly fails. Pawel adds new fields to the capabilities structures to handle Flex-10 device/function capabilities which is needed to support Flex-10 configs. Jesse improves the transmit performance by added a prefetch for the next transmit descriptor to be used when we know there are more coming. Mitch modifies i40evf driver to handle/allow an abundance of vectors. Currently the driver only maps transmit and receive queues to a single MSI-X vector per queue if there are exactly enough vectors for this, but if we have too many vectors, it will fail and allocate queues to vectors in a suboptimal manner. So change the condition check to allow for an excess number of vectors and won't use the extras. Also update the driver to just return success if the user attempts to set a port VLAN on a VF that already has the same port VLAN configured, instead of going through unnecessary filter removals & adds. Fix the MAC filters for VFs, which were being programmed with 0 for the VLAN value when there was no VLAN assigned. Instead, we must use -1 to indicate that no VLAN is in use. Fix the VF disable code, which was not properly cleaning up the VF and would leave the VF in an indeterminate state, so fix this by notifying the VF and then call the normal VF reset routine. Fix the logic in the driver so that MAC filters are added and removed correctly and added a check for the driver's hardware MAC address so that this filter does not get removed incorrectly. Carolyn removes incorrect #ifdef's which should not have been added in the first place and with the #ifdef's removed, make the necessary changes in the driver to resolve compile errors. Greg updates the admin queue command header defines. v2: fix indentation in patch 12 based on feedback from Sergei Shtylyov ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * i40e/i40evf: Bump version to 1.3.6 for i40e and 1.3.2 for i40evfCatherine Sullivan2015-07-152-2/+2
| | | | | | | | | | | | | | | | | | Bump. Change-ID: I84573d9fa51effc5b29bf5b8c74e3cc8b2673f48 Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: Refine an error message to avoid confusionAnjali Singhai Jain2015-07-151-1/+2
| | | | | | | | | | | | | | | | | | | | Change a warning message to indicate what may have really happened when the init_shared_code call fails. Change-ID: I616ace40fed120d0dec86dfc91ab2d7cde466904 Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e/i40evf: Add support for pre-allocated pages for PDFaisal Latif2015-07-154-13/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i40e_add_pd_table_entry() routine is being modified to handle both cases where a backing page is passed and where backing page is allocated in i40e_add_pd_table_entry(). For PBLE resource management, it is more efficient for it to manage its backing pages. For VF, PBLE backing page addresses will be send to PF driver for PBLE resource. The i40e_remove_pd_bp() is also modified to not free pre-allocated pages and free only ones which were allocated in i40e_add_pd_table_entry(). Change-ID: Ie673f0403f22979e9406f5a94048dceb91bcf9a8 Signed-off-by: Faisal Latif <faisal.latif@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40evf: add MAC address filter in open, not initMitch Williams2015-07-151-11/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | During close, all of the MAC filters are cleared, so the driver would be unable to receive unicast packets after being closed and reopened. Add the adapter's "hardware" MAC address filter in open, not init. This ensures that the correct filter is present each time. Change-ID: I51a11e9c1200139dab6f66a5353bd38c7d26f875 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40evf: don't delete all the filtersMitch Williams2015-07-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to an inverted conditional, the driver was marking all of its MAC filters for deletion every time set_rx_mode was called. Depending upon the timing of the calls to set_rx_mode and the processing of the admin queue, the driver would (accidentally) end up with a varying number of functional filters. Correct this logic so that MAC filters are added and removed correctly. Add a check for the driver's "hardware" MAC address so that this filter doesn't get removed incorrectly. Change-ID: Ib3e7c4a5b53df6835f164fe44cb778cb71f8aff8 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: un-disable VF after resetMitch Williams2015-07-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | When a VF is disabled, there is no way for it to recover until either the PF driver is reloaded or SR-IOV is disabled and enabled. To correct this, enable the VF after a successful reset. Change-ID: I9e0788476c4d53d5407961b503febdfff2b8a7c6 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: do a proper reset when disabling a VFMitch Williams2015-07-151-7/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | The VF disable code was just whanging on the reset bit without properly cleaning up the VF, which would leave the VF in an indeterminate state from which it could not recover. Fix this by notifying the VF and then by calling the normal VF reset routine. Change-ID: I862b9dfa919368773cbdc212b805b520db2f7430 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: correctly program filters for VFsMitch Williams2015-07-151-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | MAC filters for VFs were being programmed with 0 for the VLAN value when there was no VLAN assigned. This is incorrect and actually assigns the VF to VLAN 0. Instead, we must use -1 to indicate that no VLAN is in use. This change programs the filters correctly and gets rid of a bogus error message when setting a port VLAN on an active VF. Change-ID: Ica9a9906d768405377ff3308e27f7d0b5b2ea96e Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e/i40evf: Update the admin queue command headerGreg Rose2015-07-152-22/+20Star
| | | | | | | | | | | | | | | | | | Make the necessary updates to i40e_adminq_cmd.h. Change-ID: Ib031c86cc6cab78e5aa44c64d8ce5474be8d7e42 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: Remove incorrect #ifdef'sCarolyn Wyborny2015-07-151-10/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes some #ifdef's that should not be there. They were stopping code that is needed from being compiled in. With these #ifdef's removed, changes are needed in the driver to fix some compile errors: adding missing parameters to the definition of ndo_bridge_setlink and a ndo_dflt_brige_getlink call. Change-ID: I5516614e1bc50b6bca0647cef971bc96161ba2de Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: ignore duplicate port VLAN requestsMitch Williams2015-07-151-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | If user attempts to set a port VLAN on a VF that already has the same port VLAN configured, the driver will go through a completely unnecessary flurry of filter removals and filter adds. Just check for this condition and return success instead of doing a bunch of busywork. Change-ID: Ia1a9e83e6ed48b3f4658bc20dfc6af0cf525d54a Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40evf: Allow for an abundance of vectorsMitch Williams2015-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The driver currently only maps TX and RX queues to a single MSI-X vector per queue pair if there are exactly enough vectors for this. Unfortunately, if we have too many vectors it will fail and allocate queues to vectors in a suboptimal manner. Change the condition check to allow for excess vectors. In this case, the extras just won't be used. Change-ID: I23e1e2955c64739c86612db88a25583e6a7e0b17 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e/i40evf: improve Tx performance with a small tweakJesse Brandeburg2015-07-152-0/+4
| | | | | | | | | | | | | | | | | | | | Add a prefetch for the next Tx descriptor to be used when we know there are more coming. Change-ID: Ibb9acab11d508eec2db7da795df74debc16eeacb Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e/i40evf: Update Flex-10 related device/function capabilitiesPawel Orlowski2015-07-154-8/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Flex10 device/function capability has been upgraded to include information needed to support Flex-10 configurations. This patch adds new fields to the i40e_hw_capabilities structure and updates i40e_parse_discover_capabilities functions to extract them from the AQ response. Naming convention has changed to use flex10 mode instead of existing mfp_mode_1. Change-ID: I305dd888866985a30293acb3fb14fa43ca6b79ea Signed-off-by: Pawel Orlowski <pawel.orlowski@intel.com> Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e/i40evf: Add stats to track FD ATR and SB dynamic enable stateAnjali Singhai Jain2015-07-154-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | Since the driver can dynamically enable/disable FD ATR and SB features, these stats help keep track of the current state and along with fd_flush count provide a means to debug what could be going on with the flow director filters. This will take away the need for being verbose in our debug logs with respect to FD. Change-ID: I29224f750fe6602391043655d18996570720377d Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * i40e: Implement ndo_features_check()Joe Stringer2015-07-151-0/+20
| | | | | | | | | | | | | | | | | | | | | | i40e supports UDP tunnel headers up to 80 bytes in length, so this adds a check to ensure that it doesn't try to offload packets that exceed that. Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | pkt_sched: sch_qfq: remove unused member of struct qfq_schedAndrea Parri2015-07-161-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | The member (u32) "num_active_agg" of struct qfq_sched has been unused since its introduction in 462dbc9101acd38e92eda93c0726857517a24bbd "pkt_sched: QFQ Plus: fair-queueing service at DRR cost" and (AFAICT) there is no active plan to use it; this removes the member. Signed-off-by: Andrea Parri <parri.andrea@gmail.com> Acked-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: qlcnic: Deletion of unnecessary memsetChristophe Jaillet2015-07-161-1/+0Star
| | | | | | | | | | | | | | | | There is no need to memset memory allocated with vzalloc. Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'gianfar_rx_sg'David S. Miller2015-07-163-241/+330
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Claudiu Manoil says: ==================== gianfar: Add Rx S/G This patch-set introduces scatter/gather support on the Rx side, addressing Rx path performance issues in the driver. Thanks. As an example, two boards connected back-to-back were used to measure the throughput, running the same kernel 4.1, before and after applying these patches. The netperf UDP_STREAM results below show that the bottleneck lies on the Rx side BEFORE applying the patches, and that the Rx throughput is even lower with a larger MTU. AFTER applying the patches the Rx bottleneck is gone (Rx throughput matches the Tx one) and the RX throughput is not influenced by MTU size any longer (as expected). BEFORE: 1) MTU 1500 (default) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 20119124 0 549.4 100.00 14.911 163840 150.00 14057349 383.9 100.00 14.911 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23654013 0 80.7 100.00 101.463 163840 150.00 15875288 54.2 100.00 101.463 2) MTU 8000 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 20067232 0 548.0 100.00 14.950 163840 150.00 6113498 166.9 99.95 14.942 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23621279 0 80.6 100.00 101.604 163840 150.00 5868602 20.0 99.96 101.563 AFTER: (both MTU 1500 and MTU 8000) root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 512 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 512 150.00 19914969 0 543.8 100.00 15.064 163840 150.00 19914969 543.8 99.35 14.966 root@p1010rdb-pb:~# netperf -l 150 -cC -H 192.85.1.1 -p 12867 -t UDP_STREAM -- -m 64 MIGRATED UDP STREAM TEST from 0.0.0.0 () port 0 AF_INET to 192.85.1.1 () port 0 AF_INET Socket Message Elapsed Messages CPU Service Size Size Time Okay Errors Throughput Util Demand bytes bytes secs # # 10^6bits/sec % SS us/KB 163840 64 150.00 23433989 0 80.0 100.00 102.416 163840 150.00 23433989 80.0 99.62 102.023 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Add paged allocation and Rx S/GClaudiu Manoil2015-07-163-144/+208
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The eTSEC h/w is capable of scatter/gather on the receive side too if MAXFRM > MRBLR, when the allowed maximum Rx frame size is set to be greater than the maximum Rx buffer size (MRBLR). It's about time the driver makes use of this h/w capability, by supporting fixed buffer sizes and Rx S/G. The buffer size given to eTSEC for reception is fixed to 1536B (must be multiple of 64), which is the same default buffer size as before, used to accommodate standard MTU (1500B) size frames. As before, eTSEC can receive frames of up to 9600B. Individual Rx buffers are mapped to page halves (page size for eTSEC systems is 4KB). The skb is built around the first buffer of a frame (using build_skb()). In case the frame spans multiple buffers, the trailing buffers are added as Rx fragments to the skb. The last buffer in frame is marked by the L status flag. A mechanism is in place to reuse the pages owned by the driver (for Rx) for subsequent receptions. Supporting fixed size buffers allows the implementation of Rx S/G, which in turn removes the memory pressure issues the driver had before when MTU was set for jumbo frame reception. Also, in most cases, the Rx path becomes faster due to Rx page reusal, since the overhead of allocating new rx buffers is removed from the fast path. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Use ndev, more Rx path cleanupClaudiu Manoil2015-07-162-32/+26Star
| | | | | | | | | | | | | | | | | | | | | | | | | | Use "ndev" instead of "dev", as the rx queue back pointer to a net_device struct, to avoid name clashing with a "struct device" reference. This prepares the addition of a "struct device" back pointer to the rx queue structure. Remove duplicated rxq registration in the process. Move napi_gro_receive() outside gfar_process_frame(). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Fix and cleanup rxbd status handlingClaudiu Manoil2015-07-161-16/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several (long standing) problems about how the status field of the rx buffer descriptor (rxbd) is currently handled on the error path: - too many unnecessary 16bit reads of the two halves of the rxbd status field (32bit), also resulting in overuse of endianness convesion macros; - "bdp->status = RXBD_LARGE" makes no sense, since the "large" flag is read only (only eTSEC can write it), and trying to clear the other status bits is also error prone in this context (most of the rx status bits are read only anyway). This is fixed with a single 32bit read of the "status" field, and then the appropriate 16bit shifting is applied to access the various status bits or the rx frame length. Also corrected the use of the RXBD_LARGE flag. Additional fix: "rx_over_errors" stat is incremented instead of "rx_crc_errors" in case of RXBD_OVERRUN occurrence. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * gianfar: Bundle Rx allocation, cleanupClaudiu Manoil2015-07-163-107/+136
|/ | | | | | | | | | | | | | | | Use a more common consumer/ producer index design to improve rx buffer allocation. Instead of allocating a single new buffer (skb) on each iteration, bundle the allocation of several rx buffers at a time. This also opens the path for further memory optimizations. Remove useless check of rxq->rfbptr, since this patch touches rx pause frame handling code as well. rxq->rfbptr is always initialized as part of Rx BD ring init. Remove redundant (and misleading) 'amount_pull' parameter. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ravb: kill useless initializersSergei Shtylyov2015-07-151-6/+6
| | | | | | | | Some of the local variable intializers in the driver turned out to be pointless, kill them. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-07-14904-16745/+28373
|\ | | | | | | | | | | | | | | | | | | | | Conflicts: net/bridge/br_mdb.c Minor conflict in br_mdb.c, in 'net' we added a memset of the on-stack 'ip' variable whereas in 'net-next' we assign a new member 'vid'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2015-07-1366-394/+702
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Missing list head init in bluetooth hidp session creation, from Tedd Ho-Jeong An. 2) Don't leak SKB in bridge netfilter error paths, from Florian Westphal. 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien Grall. 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf Baechle. 5) Fix race between rhashtable resize events and table walks, from Phil Sutter. 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from Daniel Borkmann. 7) Missing security layer socket state initialization in tipc code, from Stephen Smalley. 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from Denys Vlasenko. 9) Missing minor_idr destroy on module unload on macvtap driver, from Johannes Thumshirn. 10) Various pktgen kernel thread races, from Oleg Nesterov. 11) Fix races that can cause packets to be processed in the backlog even after a device attached to that SKB has been fully unregistered. From Julian Anastasov. 12) bcmgenet driver doesn't account packet drops vs. errors properly, fix from Petri Gynther. 13) Array index validation and off by one fix in DSA layer from Florian Fainelli * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits) can: replace timestamp as unique skb attribute ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux can: c_can: Fix default pinmux glitch at init can: rcar_can: unify error messages can: rcar_can: print request_irq() error code can: rcar_can: fix typo in error message can: rcar_can: print signed IRQ # can: rcar_can: fix IRQ check net: dsa: Fix off-by-one in switch address parsing net: dsa: Test array index before use net: switchdev: don't abort unsupported operations net: bcmgenet: fix accounting of packet drops vs errors cdc_ncm: update specs URL Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets bridge: mdb: allow the user to delete mdb entry if there's a querier net: call rcu_read_lock early in process_backlog net: do not process device backlog during unregistration bridge: fix potential crash in __netdev_pick_tx() net: axienet: Fix devm_ioremap_resource return value check ...
| | * Merge tag 'linux-can-fixes-for-4.2-20150712' of ↵David S. Miller2015-07-1311-29/+42
| | |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2015-07-12 this is a pull request of 8 patchs for net/master. Sergei Shtylyov contributes 5 patches for the rcar_can driver, fixing the IRQ check and several info and error messages. There are two patches by J.D. Schroeder and Roger Quadros for the c_can driver and dra7x-evm device tree, which precent a glitch in the DCAN1 pinmux. Oliver Hartkopp provides a better approach to make the CAN skbs unique, the timestamp is replaced by a counter. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | | * can: replace timestamp as unique skb attributeOliver Hartkopp2015-07-127-17/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 514ac99c64b "can: fix multiple delivery of a single CAN frame for overlapping CAN filters" requires the skb->tstamp to be set to check for identical CAN skbs. Without timestamping to be required by user space applications this timestamp was not generated which lead to commit 36c01245eb8 "can: fix loss of CAN frames in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs by introducing several __net_timestamp() calls. This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb() to add __net_timestamp() after skbuff creation to prevent the frame loss fixed in mainline Linux. This patch removes the timestamp dependency and uses an atomic counter to create an unique identifier together with the skbuff pointer. Btw: the new skbcnt element introduced in struct can_skb_priv has to be initialized with zero in out-of-tree drivers which are not using alloc_can{,fd}_skb() too. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmuxRoger Quadros2015-07-122-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver core sets "default" pinmux on on probe and CAN driver sets "sleep" pinmux during register. This causes a small window where the CAN pins are in "default" state with the DCAN module being disabled. Change the "default" state to be like sleep so this glitch is avoided. Add a new "active" state that is used by the driver when CAN is actually active. Signed-off-by: Roger Quadros <rogerq@ti.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: c_can: Fix default pinmux glitch at initJ.D. Schroeder2015-07-121-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous change 3973c526ae9c (net: can: c_can: Disable pins when CAN interface is down) causes a slight glitch on the pinctrl settings when used. Since commit ab78029 (drivers/pinctrl: grab default handles from device core), the device core will automatically set the default pins. This causes the pins to be momentarily set to the default and then to the sleep state in register_c_can_dev(). By adding an optional "enable" state, boards can set the default pin state to be disabled and avoid the glitch when the switch from default to sleep first occurs. If the "enable" state is not available c_can_pinctrl_select_state() falls back to using the "default" pinctrl state. [Roger Q] - Forward port to v4.2 and use pinctrl_get_select(). Signed-off-by: J.D. Schroeder <jay.schroeder@garmin.com> Signed-off-by: Roger Quadros <rogerq@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: rcar_can: unify error messagesSergei Shtylyov2015-07-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All the error messages in the driver but the ones from devm_clk_get() failures use similar format. Make those two messages consitent with others. Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: rcar_can: print request_irq() error codeSergei Shtylyov2015-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also print the error code when the request_irq() call fails in rcar_can_open(), rewording the error message... Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: rcar_can: fix typo in error messageSergei Shtylyov2015-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typo in the first error message printed by rcar_can_open(). Based on the original patch by Vladimir Barinov. Fixes: 862e2b6af941 ("can: rcar_can: support all input clocks") Reported-by: Vladimir Barinov <vladimir.barinov@cogentembedded.com> Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: rcar_can: print signed IRQ #Sergei Shtylyov2015-07-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Printing IRQ # using "%x" and "%u" unsigned formats isn't quite correct as 'ndev->irq' is of type *int*, so the "%d" format needs to be used instead. While fixing this, beautify the dev_info() message in rcar_can_probe() a bit. Fixes: fd1159318e55 ("can: add Renesas R-Car CAN driver") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
| | | * can: rcar_can: fix IRQ checkSergei Shtylyov2015-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rcar_can_probe() regards 0 as a wrong IRQ #, despite platform_get_irq() that it calls returns negative error code in that case. This leads to the following being printed to the console when attempting to open the device: error requesting interrupt fffffffa because rcar_can_open() calls request_irq() with a negative IRQ #, and that function naturally fails with -EINVAL. Check for the negative error codes instead and propagate them upstream instead of just returning -ENODEV. Fixes: fd1159318e55 ("can: add Renesas R-Car CAN driver") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Cc: linux-stable <stable@vger.kernel.org> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>