summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ice: Change message levelMitch Williams2019-05-291-1/+1
| | | | | | | | | | Change the message level of the MTU change log message from debug to info. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Check all VFs for MDD activity, don't disableMitch Williams2019-05-291-12/+11Star
| | | | | | | | | | | | | | | | | | Don't use the mdd_detected variable as an exit condition for this loop; the first VF to NOT have an MDD event will cause the loop to terminate. Instead just look at all of the VFs, but don't disable them. This prevents proper release of resources if the VFs are rebooted or the VF driver reloaded. Instead, just log a message and call out repeat offenders. To make it clear what we are doing, use a differently-named variable in the loop. Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Refactor interrupt trackingBrett Creeley2019-05-296-218/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we have two MSI-x (IRQ) trackers, one for OS requested MSI-x entries (sw_irq_tracker) and one for hardware MSI-x vectors (hw_irq_tracker). Generally the sw_irq_tracker has less entries than the hw_irq_tracker because the hw_irq_tracker has entries equal to the max allowed MSI-x per PF and the sw_irq_tracker is mainly the minimum (non SR-IOV portion of the vectors, kernel granted IRQs). All of the non SR-IOV portions of the driver (i.e. LAN queues, RDMA queues, OICR, etc.) take at least one of each type of tracker resource. SR-IOV only grabs entries from the hw_irq_tracker. There are a few issues with this approach that can be seen when doing any kind of device reconfiguration (i.e. ethtool -L, SR-IOV, etc.). One of them being, any time the driver creates an ice_q_vector and associates it to a LAN queue pair it will grab and use one entry from the hw_irq_tracker and one from the sw_irq_tracker. If the indices on these does not match it will cause a Tx timeout, which will cause a reset and then the indices will match up again and traffic will resume. The mismatched indices come from the trackers not being the same size and/or the search_hint in the two trackers not being equal. Another reason for the refactor is the co-existence of features with SR-IOV. If SR-IOV is enabled and the interrupts are taken from the end of the sw_irq_tracker then other features can no longer use this space because the hardware has now given the remaining interrupts to SR-IOV. This patch reworks how we track MSI-x vectors by removing the hw_irq_tracker completely and instead MSI-x resources needed for SR-IOV are determined all at once instead of per VF. This can be done because when creating VFs we know how many are wanted and how many MSI-x vectors each VF needs. This also allows us to start using MSI-x resources from the end of the PF's allowed MSI-x vectors so we are less likely to use entries needed for other features (i.e. RDMA, L2 Offload, etc). This patch also reworks the ice_res_tracker structure by removing the search_hint and adding a new member - "end". Instead of having a search_hint we will always search from 0. The new member, "end", will be used to manipulate the end of the ice_res_tracker (specifically sw_irq_tracker) during runtime based on MSI-x vectors needed by SR-IOV. In the normal case, the end of ice_res_tracker will be equal to the ice_res_tracker's num_entries. The sriov_base_vector member was added to the PF structure. It is used to represent the starting MSI-x index of all the needed MSI-x vectors for all SR-IOV VFs. Depending on how many MSI-x are needed, SR-IOV may have to take resources from the sw_irq_tracker. This is done by setting the sw_irq_tracker->end equal to the pf->sriov_base_vector. When all SR-IOV VFs are removed then the sw_irq_tracker->end is reset back to sw_irq_tracker->num_entries. The sriov_base_vector, along with the VF's number of MSI-x (pf->num_vf_msix), vf_id, and the base MSI-x index on the PF (pf->hw.func_caps.common_cap.msix_vector_first_id), is used to calculate the first HW absolute MSI-x index for each VF, which is used to write to the VPINT_ALLOC[_PCI] and GLINT_VECT2FUNC registers to program the VFs MSI-x PCI configuration bits. Also, the sriov_base_vector is used along with VF's num_vf_msix, vf_id, and q_vector->v_idx to determine the MSI-x register index (used for writing to GLINT_DYN_CTL) within the PF's space. Interrupt changes removed any references to hw_base_vector, hw_oicr_idx, and hw_irq_tracker. Only sw_base_vector, sw_oicr_idx, and sw_irq_tracker variables remain. Change all of these by removing the "sw_" prefix to help avoid confusion with these variables and their use. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Add handler for ethtool selftestAnirudh Venkataramanan2019-05-2911-13/+752
| | | | | | | | | This patch adds a handler for ethtool selftest. Selftest includes testing link, interrupts, eeprom, registers and packet loopback. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Don't call ice_cfg_itr() for SR-IOVBrett Creeley2019-05-291-1/+2
| | | | | | | | | | | ice_cfg_itr() sets the ITR granularity and default ITR values for the PF's interrupt vectors. For VF's this will be done in the AVF driver flow. Fix this by not calling ice_cfg_itr() for SR-IOV. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Set minimum default Rx descriptor count to 512Brett Creeley2019-05-291-6/+13
| | | | | | | | | | | | | | Currently we set the default number of Rx descriptors per queue to the system's page size divided by the number of bytes per descriptor. For 4K page size systems this is resulting in 128 Rx descriptors per queue. This is causing more dropped packets than desired in the default configuration. Fix this by setting the minimum default Rx descriptor count per queue to 512. Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Resolve static analysis warningBruce Allan2019-05-291-4/+4
| | | | | | | | | | Some static analysis tools can complain when doing a bitop assignment using operands of different sizes. Fix that. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Implement toggling ethtool rx-vlan-filterTony Nguyen2019-05-291-0/+7
| | | | | | | | | | Implement the toggling of rx-vlan-filter; enable|disable VLAN pruning based on on|off, respectively. Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Remove direct write for GLLAN_RCTL_0Anirudh Venkataramanan2019-05-291-3/+0Star
| | | | | | | | | Clear PXE mode AQ call (opcode 0x0110) is now supported in FW. So remove the direct register write to GLLAN_RCTL_0. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* ice: Fix LINE_SPACING style issueBruce Allan2019-05-291-1/+0Star
| | | | | | | | | | Fix a checkpatch "LINE_SPACING: Please don't use multiple blank lines" issue that has snuck in to the code. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* macvlan: Replace strncpy() by strscpy()Gustavo A. R. Silva2019-05-291-1/+1
| | | | | | | | | | | | | | | | The strncpy() function is being deprecated. Replace it by the safer strscpy() and fix the following Coverity warning: "Calling strncpy with a maximum size argument of 16 bytes on destination array ifrr.ifr_ifrn.ifrn_name of size 16 bytes might leave the destination string unterminated." Notice that, unlike strncpy(), strscpy() always null-terminates the destination string. Addresses-Coverity-ID: 1445537 ("Buffer not null terminated") Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch '1GbE' of ↵David S. Miller2019-05-298-94/+47Star
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== 1GbE Intel Wired LAN Driver Updates 2019-05-28 This series contains updates to e1000e, igb and igc. Feng adds additional information on a warning message when a read of a hardware register fails. Gustavo A. R. Silva fixes up two "fall through" code comments so that the checkers can actually determine that we did comment that the case statement is falling through to the next case. Sasha does some cleanup on the igc driver by removing duplicate white space and removed a unneeded workaround for igc. Adds support for flow control to the igc driver. Konstantin Khlebnikov reverts a previous fix which was causing a false positive for a hardware hang. Provides a fix so that when link is lost the packets in the transmit queue are flushed and wakes the transmit queue when the NIC is ready to send packets. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * igc: Cleanup the redundant codeSasha Neftin2019-05-291-20/+3Star
| | | | | | | | | | | | | | | | | | | | The default flow control settings for the i225 device is both 'rx' and 'tx' pause frames. There is no depend on the NVM value. This patch comes to fix this and clean up the driver code. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igc: Add flow control supportSasha Neftin2019-05-292-0/+24
| | | | | | | | | | | | | | | | | | This change adds flow control settings. This is required to enable the legacy flow control support. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * e1000e: start network tx queue only when link is upKonstantin Khlebnikov2019-05-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Driver does not want to keep packets in Tx queue when link is lost. But present code only reset NIC to flush them, but does not prevent queuing new packets. Moreover reset sequence itself could generate new packets via netconsole and NIC falls into endless reset loop. This patch wakes Tx queue only when NIC is ready to send packets. This is proper fix for problem addressed by commit 0f9e980bf5ee ("e1000e: fix cyclic resets at link up with active tx"). Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Tested-by: Joseph Yasi <joe.yasi@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * Revert "e1000e: fix cyclic resets at link up with active tx"Konstantin Khlebnikov2019-05-291-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0f9e980bf5ee1a97e2e401c846b2af989eb21c61. That change cased false-positive warning about hardware hang: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready e1000e 0000:00:1f.6 eth0: Detected Hardware Unit Hang: TDH <0> TDT <1> next_to_use <1> next_to_clean <0> buffer_info[next_to_clean]: time_stamp <fffba7a7> next_to_watch <0> jiffies <fffbb140> next_to_watch.status <0> MAC Status <40080080> PHY Status <7949> PHY 1000BASE-T Status <0> PHY Extended Status <3000> PCI Status <10> e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx Besides warning everything works fine. Original issue will be fixed property in following patch. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reported-by: Joseph Yasi <joe.yasi@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=203175 Tested-by: Joseph Yasi <joe.yasi@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Tested-by: Oleksandr Natalenko <oleksandr@redhat.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igc: Remove the obsolete workaroundSasha Neftin2019-05-292-58/+3Star
| | | | | | | | | | | | | | | | | | | | | | Enables a resend request after the completion timeout workaround is not relevant for i225 device. This patch is clean code relevant this workaround. Minor cosmetic fixes, replace the 'spaces' with 'tabs' Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igc: Clean up unused pointersSasha Neftin2019-05-291-3/+0Star
| | | | | | | | | | | | | | | | | | Few function pointers from phy_operations structure were unused. This patch cleans those. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igc: Fix double definitionsSasha Neftin2019-05-291-3/+0Star
| | | | | | | | | | | | | | | | | | Collision threshold and threshold's shift has been defined twice. This patch comes to fix that. Signed-off-by: Sasha Neftin <sasha.neftin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igb: mark expected switch fall-throughGustavo A. R. Silva2019-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/intel/igb/e1000_82575.c: In function ‘igb_get_invariants_82575’: drivers/net/ethernet/intel/igb/e1000_82575.c:636:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (igb_sgmii_uses_mdio_82575(hw)) { ^ drivers/net/ethernet/intel/igb/e1000_82575.c:642:2: note: here case E1000_CTRL_EXT_LINK_MODE_PCIE_SERDES: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igb: mark expected switch fall-throughGustavo A. R. Silva2019-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warning: drivers/net/ethernet/intel/igb/igb_main.c: In function ‘__igb_notify_dca’: drivers/net/ethernet/intel/igb/igb_main.c:6694:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (dca_add_requester(dev) == 0) { ^ drivers/net/ethernet/intel/igb/igb_main.c:6701:2: note: here case DCA_PROVIDER_REMOVE: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Notice that, in this particular case, the code comment is modified in accordance with what GCC is expecting to find. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: "Gustavo A. R. Silva" <gustavo@embeddedor.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
| * igb/igc: warn when fatal read failure happensFeng Tang2019-05-292-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failed in read the HW register is very serious for igb/igc driver, as its hw_addr will be set to NULL and cause the adapter be seen as "REMOVED". We saw the error only a few times in the MTBF test for suspend/resume, but can hardly get any useful info to debug. Adding WARN() so that we can get the necessary information about where and how it happens, and use it for root causing and fixing this "PCIe link lost issue" This affects igb, igc. Signed-off-by: Feng Tang <feng.tang@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Acked-by: Sasha Neftin <sasha.neftin@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
* | Merge branch 'net-API-and-initial-implementation-for-nexthop-objects'David S. Miller2019-05-298-2/+1765
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Ahern says: ==================== net: API and initial implementation for nexthop objects This set contains the API and initial implementation for nexthops as standalone objects. Patch 1 contains the UAPI and updates to selinux struct. Patch 2 contains the barebones code for nexthop commands, rbtree maintenance and notifications. Patch 3 then adds support for IPv4 gateways along with handling of netdev events. Patch 4 adds support for IPv6 gateways. Patch 5 has the implementation of the encap attributes. Patch 6 adds support for nexthop groups. At the end of this set, nexthop objects can be created and deleted and userspace can monitor nexthop events, but ipv4 and ipv6 routes can not use them yet. Once the nexthop struct is defined, follow on sets add it to fib{6}_info and handle it within the respective code before routes can be inserted using them. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthop: Add support for nexthop groupsDavid Ahern2019-05-292-24/+578
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the creation of nexthop groups which reference other nexthop objects to create multipath routes: +--------------+ +------------+ +--------------+ | | nh nh_grp --->| nh_grp_entry |-+ +------------+ +---------|----+ ^ | | +------------+ +----------------+ +--->| nh, weight | nh_parent +------------+ A group entry points to a nexthop with a weight for that hop within the group. The nexthop has a list_head, grp_list, for tracking which groups it is a member of and the group entry has a reference back to the parent. The grp_list is used when a nexthop is deleted - to efficiently remove it from groups using it. If a nexthop group spec is given, no other attributes can be set. Each nexthop id in a group spec must already exist. Similar to single nexthops, the specification of a nexthop group can be updated so that data is managed with rcu locking. Add path selection function to account for multiple paths and add ipv{4,6}_good_nh helpers to know that if a neighbor entry exists it is in a good state. Update NETDEV event handling to rebalance multipath nexthop groups if a nexthop is deleted due to a link event (down or unregister). When a nexthop is removed any groups using it are updated. Groups using a nexthop a tracked via a grp_list. Nexthop dumps can be limited to groups only by adding NHA_GROUPS to the request. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthop: Add support for lwt encapsDavid Ahern2019-05-292-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for NHA_ENCAP and NHA_ENCAP_TYPE. Leverages the existing code for lwtunnel within fib_nh_common, so the only change needed is handling the attributes in the nexthop code. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthop: Add support for IPv6 gatewaysDavid Ahern2019-05-292-0/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Handle IPv6 gateway in a nexthop spec. If nh_family is set to AF_INET6, NHA_GATEWAY is expected to be an IPv6 address. Add ipv6 option to gw in nh_config to hold the address, add fib6_nh to nh_info to leverage the ipv6 initialization and cleanup code. Update nh_fill_node to dump the v6 address. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | nexthop: Add support for IPv4 nexthopsDavid Ahern2019-05-292-0/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for IPv4 nexthops. If nh_family is set to AF_INET, then NHA_GATEWAY is expected to be an IPv4 address. Register for netdev events to be notified of admin up/down changes as well as deletes. A hash table is used to track nexthop per devices to quickly convert device events to the affected nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: Initial nexthop codeDavid Ahern2019-05-295-1/+831
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Barebones start point for nexthops. Implementation for RTM commands, notifications, management of rbtree for holding nexthops by id, and kernel side data structures for nexthops and nexthop config. Nexthops are maintained in an rbtree sorted by id. Similar to routes, nexthops are configured per namespace using netns_nexthop struct added to struct net. Nexthop notifications are sent when a nexthop is added or deleted, but NOT if the delete is due to a device event or network namespace teardown (which also involves device events). Applications are expected to use the device down event to flush nexthops and any routes used by the nexthops. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: nexthop uapiDavid Ahern2019-05-293-1/+70
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | New UAPI for nexthops as standalone objects: - defines netlink ancillary header, struct nhmsg - RTM commands for nexthop objects, RTM_*NEXTHOP, - RTNLGRP for nexthop notifications, RTNLGRP_NEXTHOP, - Attributes for creating nexthops, NHA_* - Attribute for route specs to specify a nexthop by id, RTA_NH_ID. The nexthop attributes and semantics follow the route and RTA ones for device, gateway and lwt encap. Unique to nexthop objects are a blackhole and a group which contains references to other nexthop objects. With the exception of blackhole and group, nexthop objects MUST contain a device. Gateway and encap are optional. Nexthop groups can only reference other pre-existing nexthops by id. If the NHA_ID attribute is present that id is used for the nexthop. If not specified, one is auto assigned. Dump requests can include attributes: - NHA_GROUPS to return only nexthop groups, - NHA_MASTER to limit dumps to nexthops with devices enslaved to the given master (e.g., VRF) - NHA_OIF to limit dumps to nexthops using given device nlmsg_route_perms in selinux code is updated for the new RTM comands. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'hns3-next'David S. Miller2019-05-2912-80/+213
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Huazhong Tan says: ==================== code optimizations & bugfixes for HNS3 driver This patch-set includes code optimizations and bugfixes for the HNS3 ethernet controller driver. [patch 1/12] fixes a compile warning reported by kbuild test robot. [patch 2/12] fixes HNS3_RXD_GRO_SIZE_M macro definition error. [patch 3/12] adds a debugfs command to dump firmware information. [patch 4/12 - 10/12] adds some code optimizaions and cleanups for reset and driver unloading. [patch 11/12 - 12/12] adds two bugfixes. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: fix a memory leak issue for hclge_map_unmap_ring_to_vf_vectorHuazhong Tan2019-05-291-3/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When hclge_bind_ring_with_vector() fails, hclge_map_unmap_ring_to_vf_vector() returns the error directly, so nobody will free the memory allocated by hclge_get_ring_chain_from_mbx(). So hclge_free_vector_ring_chain() should be called no matter hclge_bind_ring_with_vector() fails or not. Fixes: 84e095d64ed9 ("net: hns3: Change PF to add ring-vect binding & resetQ to mailbox") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: adjust hns3_uninit_phy()'s location in the hns3_client_uninit()Huazhong Tan2019-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hns3_uninit_phy() should be called before checking HNS3_NIC_STATE_INITED flags, otherwise when this checking fails, there is nobody to call hns3_uninit_phy(). Fixes: c8a8045b2d0a ("net: hns3: Fix NULL deref when unloading driver") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: stop schedule reset service while unloading driverHuazhong Tan2019-05-293-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When unloading driver, the reset task should not be scheduled anymore. If disable IRQ before cancel ongoing reset task, the IRQ may be re-enabled by the reset task. This patch uses HCLGE_STATE_REMOVING/HCLGEVF_STATE_REMOVING flag to indicate that the driver is unloading, and we should stop new coming reset service to be scheduled, otherwise, reset service will access some resource which has been freed by unloading. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add handshake with hardware while doing resetHuazhong Tan2019-05-294-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reset happens, the hardware reset should begin after the driver has finished its preparatory work, otherwise it may cause some hardware error. Before Hardware's reset, it will wait for the driver to write bit HCLGE_NIC_CMQ_ENABLE of register HCLGE_NIC_CSQ_DEPTH_REG to 1, while the driver finishes its preparatory work will do that. BTW, since some cases this register will be cleared, so it needs some sync time before driver's writing. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: modify hclgevf_init_client_instance()Huazhong Tan2019-05-291-29/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | hclgevf_init_client_instance() is a little bloated and there is some duplicated code. This patch adds some cleanup for it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: modify hclge_init_client_instance()Huazhong Tan2019-05-291-37/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | hclge_init_client_instance() is a little bloated and there is some duplicated code. This patch adds some cleanup for it. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: use HCLGEVF_STATE_NIC_REGISTERED to indicate VF NIC client has ↵Huazhong Tan2019-05-292-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | registered When VF NIC client's init_instance() succeeds, it means this client has been registered successfully, so we use HCLGEVF_STATE_NIC_REGISTERED to indicate that. And before calling VF NIC client's uninit_instance(), we clear this state. So any operation of VF NIC client from HCLGEVF is not allowed if this state is not set. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: use HCLGE_STATE_ROCE_REGISTERED to indicate PF ROCE client has ↵Huazhong Tan2019-05-292-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | registered When PF ROCE client's init_instance() succeeds, it means this client has been registered successfully, so we use HCLGE_STATE_ROCE_REGISTERED to indicate that. And before calling PF ROCE client's uninit_instance(), we clear this state. So any operation of the ROCE client from HCLGE is not allowed if this state is not set. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: use HCLGE_STATE_NIC_REGISTERED to indicate PF NIC client has ↵Huazhong Tan2019-05-292-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | registered When PF NIC client's init_instance() succeeds, it means this client has been registered successfully, so we use HCLGE_STATE_NIC_REGISTERED to indicate that. And before calling PF NIC client's uninit_instance(), we clear this state. So any operation of PF NIC client from HCLGE is not allowed if this state is not set. Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: add support for dump firmware statistics by debugfsZhongzhu Liu2019-05-293-0/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prints firmware statistics information. debugfs command: echo dump m7 info > cmd estuary:/dbg/hns3/0000:7d:00.0$ echo dump m7 info > cmd [ 172.577240] hns3 0000:7d:00.0: 0x00000000 0x00000000 0x00000000 [ 172.583471] hns3 0000:7d:00.0: 0x00000000 0x00000000 0x00000000 [ 172.589552] hns3 0000:7d:00.0: 0x00000030 0x00000000 0x00000000 [ 172.595632] hns3 0000:7d:00.0: 0x00000000 0x00000000 0x00000000 estuary:/dbg/hns3/0000:7d:00.0$ Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: fix for HNS3_RXD_GRO_SIZE_M macroYunsheng Lin2019-05-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to hardware user menual, the GRO_SIZE is 14 bits width, the HNS3_RXD_GRO_SIZE_M is 10 bits width now, which may cause hardware GRO received packet error problem. Fixes: a6d53b97a2e7 ("net: hns3: Adds GRO params to SKB for the stack") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: hns3: fix compile warning without CONFIG_RFS_ACCELJian Shen2019-05-291-2/+0Star
|/ / | | | | | | | | | | | | | | | | | | | | | | The ifdef condition of function hclge_add_fd_entry_by_arfs() is unnecessary. It may cause compile warning when CONFIG_RFS_ACCEL is not chosen. This patch fixes it by removing the ifdef condition. Fixes: d93ed94fbeaf ("net: hns3: add aRFS support for PF") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | hinic: fix a bug in set rx modeXue Chaojing2019-05-291-4/+0Star
| | | | | | | | | | | | | | | | in set_rx_mode, __dev_mc_sync and netdev_for_each_mc_addr will repeatedly set the multicast mac address. so we delete this loop. Signed-off-by: Xue Chaojing <xuechaojing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'inet-frags-followup'David S. Miller2019-05-294-23/+43
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric Dumazet says: ==================== inet: frags: followup to 'inet-frags-avoid-possible-races-at-netns-dismantle' Latest patch series ('inet-frags-avoid-possible-races-at-netns-dismantle') brought another syzbot report shown in the third patch changelog. While fixing the issue, I had to call inet_frags_fini() later in IPv6 and ilowpan. Also I believe a completion is needed to ensure proper dismantle at module removal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet: frags: fix use-after-free read in inet_frag_destroy_rcuEric Dumazet2019-05-292-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As caught by syzbot [1], the rcu grace period that is respected before fqdir_rwork_fn() proceeds and frees fqdir is not enough to prevent inet_frag_destroy_rcu() being run after the freeing. We need a proper rcu_barrier() synchronization to replace the one we had in inet_frags_fini() We also have to fix a potential problem at module removal : inet_frags_fini() needs to make sure that all queued work queues (fqdir_rwork_fn) have completed, otherwise we might call kmem_cache_destroy() too soon and get another use-after-free. [1] BUG: KASAN: use-after-free in inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 Read of size 8 at addr ffff88806ed47a18 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.2.0-rc1+ #2 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132 inet_frag_destroy_rcu+0xd9/0xe0 net/ipv4/inet_fragment.c:201 __rcu_reclaim kernel/rcu/rcu.h:222 [inline] rcu_do_batch kernel/rcu/tree.c:2092 [inline] invoke_rcu_callbacks kernel/rcu/tree.c:2310 [inline] rcu_core+0xba5/0x1500 kernel/rcu/tree.c:2291 __do_softirq+0x25c/0x94c kernel/softirq.c:293 invoke_softirq kernel/softirq.c:374 [inline] irq_exit+0x180/0x1d0 kernel/softirq.c:414 exiting_irq arch/x86/include/asm/apic.h:536 [inline] smp_apic_timer_interrupt+0x13b/0x550 arch/x86/kernel/apic/apic.c:1068 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:806 </IRQ> RIP: 0010:native_safe_halt+0xe/0x10 arch/x86/include/asm/irqflags.h:61 Code: ff ff 48 89 df e8 f2 95 8c fa eb 82 e9 07 00 00 00 0f 00 2d e4 45 4b 00 f4 c3 66 90 e9 07 00 00 00 0f 00 2d d4 45 4b 00 fb f4 <c3> 90 55 48 89 e5 41 57 41 56 41 55 41 54 53 e8 8e 18 42 fa e8 99 RSP: 0018:ffff8880a98e7d78 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: 1ffffffff1164e11 RBX: ffff8880a98d4340 RCX: 0000000000000000 RDX: dffffc0000000000 RSI: 0000000000000006 RDI: ffff8880a98d4bbc RBP: ffff8880a98e7da8 R08: ffff8880a98d4340 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffffffff88b27078 R14: 0000000000000001 R15: 0000000000000000 arch_cpu_idle+0xa/0x10 arch/x86/kernel/process.c:571 default_idle_call+0x36/0x90 kernel/sched/idle.c:94 cpuidle_idle_call kernel/sched/idle.c:154 [inline] do_idle+0x377/0x560 kernel/sched/idle.c:263 cpu_startup_entry+0x1b/0x20 kernel/sched/idle.c:354 start_secondary+0x34e/0x4c0 arch/x86/kernel/smpboot.c:267 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 8877: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_kmalloc mm/kasan/common.c:489 [inline] __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462 kasan_kmalloc+0x9/0x10 mm/kasan/common.c:503 kmem_cache_alloc_trace+0x151/0x750 mm/slab.c:3555 kmalloc include/linux/slab.h:547 [inline] kzalloc include/linux/slab.h:742 [inline] fqdir_init include/net/inet_frag.h:115 [inline] ipv6_frags_init_net+0x48/0x460 net/ipv6/reassembly.c:513 ops_init+0xb3/0x410 net/core/net_namespace.c:130 setup_net+0x2d3/0x740 net/core/net_namespace.c:316 copy_net_ns+0x1df/0x340 net/core/net_namespace.c:439 create_new_namespaces+0x400/0x7b0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc2/0x200 kernel/nsproxy.c:206 ksys_unshare+0x440/0x980 kernel/fork.c:2692 __do_sys_unshare kernel/fork.c:2760 [inline] __se_sys_unshare kernel/fork.c:2758 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2758 do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 17: save_stack+0x23/0x90 mm/kasan/common.c:71 set_track mm/kasan/common.c:79 [inline] __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451 kasan_slab_free+0xe/0x10 mm/kasan/common.c:459 __cache_free mm/slab.c:3432 [inline] kfree+0xcf/0x220 mm/slab.c:3755 fqdir_rwork_fn+0x33/0x40 net/ipv4/inet_fragment.c:154 process_one_work+0x989/0x1790 kernel/workqueue.c:2269 worker_thread+0x98/0xe40 kernel/workqueue.c:2415 kthread+0x354/0x420 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 The buggy address belongs to the object at ffff88806ed47a00 which belongs to the cache kmalloc-512 of size 512 The buggy address is located 24 bytes inside of 512-byte region [ffff88806ed47a00, ffff88806ed47c00) The buggy address belongs to the page: page:ffffea0001bb51c0 refcount:1 mapcount:0 mapping:ffff8880aa400940 index:0x0 flags: 0x1fffc0000000200(slab) raw: 01fffc0000000200 ffffea000282a788 ffffea0001bb53c8 ffff8880aa400940 raw: 0000000000000000 ffff88806ed47000 0000000100000006 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88806ed47900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88806ed47a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88806ed47a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88806ed47b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Fixes: 3c8fc8782044 ("inet: frags: rework rhashtable dismantle") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet: frags: call inet_frags_fini() after unregister_pernet_subsys()Eric Dumazet2019-05-292-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both IPv6 and 6lowpan are calling inet_frags_fini() too soon. inet_frags_fini() is dismantling a kmem_cache, that might be needed later when unregister_pernet_subsys() eventually has to remove frags queues from hash tables and free them. This fixes potential use-after-free, and is a prereq for the following patch. Fixes: d4ad4d22e7ac ("inet: frags: use kmem_cache for inet_frag_queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | inet: frags: uninline fqdir_init()Eric Dumazet2019-05-292-19/+20
|/ / | | | | | | | | | | | | fqdir_init() is not fast path and is getting bigger. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | selftests/net: ipv6 flowlabelWillem de Bruijn2019-05-295-2/+453
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test the IPv6 flowlabel control and datapath interfaces: Acquire and release the right to use flowlabels with socket option IPV6_FLOWLABEL_MGR. Then configure flowlabels on send and read them on recv with cmsg IPV6_FLOWINFO. Also verify auto-flowlabel if not explicitly set. This helped identify the issue fixed in commit 95c169251bf73 ("ipv6: invert flowlabel sharing check in process and user mode") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | enetc: Enable TC offloading with mqprioCamelia Groza2019-05-295-1/+72
| | | | | | | | | | | | | | | | | | | | | | | | Add support to configure multiple prioritized TX traffic classes with mqprio. Configure one BD ring per TC for the moment, one netdev queue per TC. Signed-off-by: Camelia Groza <camelia.groza@nxp.com> Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'stmmac-SPDX'David S. Miller2019-05-292-14/+2Star
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | Neil Armstrong says: ==================== net: stmmac: dwmac-meson: update with SPDX Licence identifier Update the SPDX Licence identifier for the Amlogic Meson6 and Meson8 dwmac glue drivers. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>