diff options
Diffstat (limited to 'Documentation/networking')
16 files changed, 517 insertions, 259 deletions
diff --git a/Documentation/networking/batman-adv.rst b/Documentation/networking/batman-adv.rst index 245fb6c0ab6f..18020943ba25 100644 --- a/Documentation/networking/batman-adv.rst +++ b/Documentation/networking/batman-adv.rst @@ -27,24 +27,8 @@ Load the batman-adv module into your kernel:: $ insmod batman-adv.ko The module is now waiting for activation. You must add some interfaces on which -batman can operate. After loading the module batman advanced will scan your -systems interfaces to search for compatible interfaces. Once found, it will -create subfolders in the ``/sys`` directories of each supported interface, -e.g.:: - - $ ls /sys/class/net/eth0/batman_adv/ - elp_interval iface_status mesh_iface throughput_override - -If an interface does not have the ``batman_adv`` subfolder, it probably is not -supported. Not supported interfaces are: loopback, non-ethernet and batman's -own interfaces. - -Note: After the module was loaded it will continuously watch for new -interfaces to verify the compatibility. There is no need to reload the module -if you plug your USB wifi adapter into your machine after batman advanced was -initially loaded. - -The batman-adv soft-interface can be created using the iproute2 tool ``ip``:: +batman-adv can operate. The batman-adv soft-interface can be created using the +iproute2 tool ``ip``:: $ ip link add name bat0 type batadv @@ -52,57 +36,46 @@ To activate a given interface simply attach it to the ``bat0`` interface:: $ ip link set dev eth0 master bat0 -Repeat this step for all interfaces you wish to add. Now batman starts +Repeat this step for all interfaces you wish to add. Now batman-adv starts using/broadcasting on this/these interface(s). -By reading the "iface_status" file you can check its status:: - - $ cat /sys/class/net/eth0/batman_adv/iface_status - active - To deactivate an interface you have to detach it from the "bat0" interface:: $ ip link set dev eth0 nomaster +The same can also be done using the batctl interface subcommand:: -All mesh wide settings can be found in batman's own interface folder:: + batctl -m bat0 interface create + batctl -m bat0 interface add -M eth0 - $ ls /sys/class/net/bat0/mesh/ - aggregated_ogms fragmentation isolation_mark routing_algo - ap_isolation gw_bandwidth log_level vlan0 - bonding gw_mode multicast_mode - bridge_loop_avoidance gw_sel_class network_coding - distributed_arp_table hop_penalty orig_interval +To detach eth0 and destroy bat0:: -There is a special folder for debugging information:: + batctl -m bat0 interface del -M eth0 + batctl -m bat0 interface destroy - $ ls /sys/kernel/debug/batman_adv/bat0/ - bla_backbone_table log neighbors transtable_local - bla_claim_table mcast_flags originators - dat_cache nc socket - gateways nc_nodes transtable_global +There are additional settings for each batadv mesh interface, vlan and hardif +which can be modified using batctl. Detailed information about this can be found +in its manual. -Some of the files contain all sort of status information regarding the mesh -network. For example, you can view the table of originators (mesh -participants) with:: +For instance, you can check the current originator interval (value +in milliseconds which determines how often batman-adv sends its broadcast +packets):: - $ cat /sys/kernel/debug/batman_adv/bat0/originators - -Other files allow to change batman's behaviour to better fit your requirements. -For instance, you can check the current originator interval (value in -milliseconds which determines how often batman sends its broadcast packets):: - - $ cat /sys/class/net/bat0/mesh/orig_interval + $ batctl -M bat0 orig_interval 1000 and also change its value:: - $ echo 3000 > /sys/class/net/bat0/mesh/orig_interval + $ batctl -M bat0 orig_interval 3000 In very mobile scenarios, you might want to adjust the originator interval to a lower value. This will make the mesh more responsive to topology changes, but will also increase the overhead. +Information about the current state can be accessed via the batadv generic +netlink family. batctl provides human readable version via its debug tables +subcommands. + Usage ===== @@ -147,43 +120,16 @@ batman-adv module. When building batman-adv as part of kernel, use "make menuconfig" and enable the option ``B.A.T.M.A.N. debugging`` (``CONFIG_BATMAN_ADV_DEBUG=y``). -Those additional debug messages can be accessed using a special file in -debugfs:: +Those additional debug messages can be accessed using the perf infrastructure:: - $ cat /sys/kernel/debug/batman_adv/bat0/log + $ trace-cmd stream -e batadv:batadv_dbg The additional debug output is by default disabled. It can be enabled during -run time. Following log_levels are defined: - -.. flat-table:: - - * - 0 - - All debug output disabled - * - 1 - - Enable messages related to routing / flooding / broadcasting - * - 2 - - Enable messages related to route added / changed / deleted - * - 4 - - Enable messages related to translation table operations - * - 8 - - Enable messages related to bridge loop avoidance - * - 16 - - Enable messages related to DAT, ARP snooping and parsing - * - 32 - - Enable messages related to network coding - * - 64 - - Enable messages related to multicast - * - 128 - - Enable messages related to throughput meter - * - 255 - - Enable all messages - -The debug output can be changed at runtime using the file -``/sys/class/net/bat0/mesh/log_level``. e.g.:: - - $ echo 6 > /sys/class/net/bat0/mesh/log_level - -will enable debug messages for when routes change. +run time:: + + $ batctl -m bat0 loglevel routes tt + +will enable debug messages for when routes and translation table entries change. Counters for different types of packets entering and leaving the batman-adv module are available through ethtool:: diff --git a/Documentation/networking/decnet.txt b/Documentation/networking/decnet.txt index e12a4900cf72..d192f8b9948b 100644 --- a/Documentation/networking/decnet.txt +++ b/Documentation/networking/decnet.txt @@ -22,8 +22,6 @@ you'll need the following options as well... CONFIG_DECNET_ROUTER (to be able to add/delete routes) CONFIG_NETFILTER (will be required for the DECnet routing daemon) - CONFIG_DECNET_ROUTE_FWMARK is optional - Don't turn on SIOCGIFCONF support for DECnet unless you are really sure that you need it, in general you won't and it can cause ifconfig to malfunction. diff --git a/Documentation/networking/devlink-info-versions.rst b/Documentation/networking/devlink-info-versions.rst index c79ad8593383..4316342b7746 100644 --- a/Documentation/networking/devlink-info-versions.rst +++ b/Documentation/networking/devlink-info-versions.rst @@ -41,3 +41,8 @@ fw.ncsi Version of the software responsible for supporting/handling the Network Controller Sideband Interface. + +fw.psid +======= + +Unique identifier of the firmware parameter set. diff --git a/Documentation/networking/dsa/bcm_sf2.txt b/Documentation/networking/dsa/bcm_sf2.rst index eba3a2431e91..dee234039e1e 100644 --- a/Documentation/networking/dsa/bcm_sf2.txt +++ b/Documentation/networking/dsa/bcm_sf2.rst @@ -1,3 +1,4 @@ +============================================= Broadcom Starfighter 2 Ethernet switch driver ============================================= @@ -25,27 +26,27 @@ are connected at a lower speed. The switch hardware block is typically interfaced using MMIO accesses and contains a bunch of sub-blocks/registers: -* SWITCH_CORE: common switch registers -* SWITCH_REG: external interfaces switch register -* SWITCH_MDIO: external MDIO bus controller (there is another one in SWITCH_CORE, +- ``SWITCH_CORE``: common switch registers +- ``SWITCH_REG``: external interfaces switch register +- ``SWITCH_MDIO``: external MDIO bus controller (there is another one in SWITCH_CORE, which is used for indirect PHY accesses) -* SWITCH_INDIR_RW: 64-bits wide register helper block -* SWITCH_INTRL2_0/1: Level-2 interrupt controllers -* SWITCH_ACB: Admission control block -* SWITCH_FCB: Fail-over control block +- ``SWITCH_INDIR_RW``: 64-bits wide register helper block +- ``SWITCH_INTRL2_0/1``: Level-2 interrupt controllers +- ``SWITCH_ACB``: Admission control block +- ``SWITCH_FCB``: Fail-over control block Implementation details ====================== -The driver is located in drivers/net/dsa/bcm_sf2.c and is implemented as a DSA -driver; see Documentation/networking/dsa/dsa.txt for details on the subsystem +The driver is located in ``drivers/net/dsa/bcm_sf2.c`` and is implemented as a DSA +driver; see ``Documentation/networking/dsa/dsa.rst`` for details on the subsystem and what it provides. The SF2 switch is configured to enable a Broadcom specific 4-bytes switch tag which gets inserted by the switch for every packet forwarded to the CPU interface, conversely, the CPU network interface should insert a similar tag for packets entering the CPU port. The tag format is described in -net/dsa/tag_brcm.c. +``net/dsa/tag_brcm.c``. Overall, the SF2 driver is a fairly regular DSA driver; there are a few specifics covered below. @@ -54,7 +55,7 @@ Device Tree probing ------------------- The DSA platform device driver is probed using a specific compatible string -provided in net/dsa/dsa.c. The reason for that is because the DSA subsystem gets +provided in ``net/dsa/dsa.c``. The reason for that is because the DSA subsystem gets registered as a platform device driver currently. DSA will provide the needed device_node pointers which are then accessible by the switch driver setup function to setup resources such as register ranges and interrupts. This @@ -70,7 +71,7 @@ Broadcom switches connected to a SF2 require the use of the DSA slave MDIO bus in order to properly configure them. By default, the SF2 pseudo-PHY address, and an external switch pseudo-PHY address will both be snooping for incoming MDIO transactions, since they are at the same address (30), resulting in some kind of -"double" programming. Using DSA, and setting ds->phys_mii_mask accordingly, we +"double" programming. Using DSA, and setting ``ds->phys_mii_mask`` accordingly, we selectively divert reads and writes towards external Broadcom switches pseudo-PHY addresses. Newer revisions of the SF2 hardware have introduced a configurable pseudo-PHY address which circumvents the initial design limitation. @@ -86,7 +87,7 @@ firmware gets reloaded. The SF2 driver relies on such events to properly set its MoCA interface carrier state and properly report this to the networking stack. The MoCA interfaces are supported using the PHY library's fixed PHY/emulated PHY -device and the switch driver registers a fixed_link_update callback for such +device and the switch driver registers a ``fixed_link_update`` callback for such PHYs which reflects the link state obtained from the interrupt handler. diff --git a/Documentation/networking/dsa/dsa.txt b/Documentation/networking/dsa/dsa.rst index 43ef767bc440..ca87068b9ab9 100644 --- a/Documentation/networking/dsa/dsa.txt +++ b/Documentation/networking/dsa/dsa.rst @@ -1,10 +1,8 @@ -Distributed Switch Architecture -=============================== - -Introduction +============ +Architecture ============ -This document describes the Distributed Switch Architecture (DSA) subsystem +This document describes the **Distributed Switch Architecture (DSA)** subsystem design principles, limitations, interactions with other subsystems, and how to develop drivers for this subsystem as well as a TODO for developers interested in joining the effort. @@ -70,11 +68,11 @@ Switch tagging protocols DSA currently supports 5 different tagging protocols, and a tag-less mode as well. The different protocols are implemented in: -net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy) -net/dsa/tag_dsa.c: Marvell's original DSA tag -net/dsa/tag_edsa.c: Marvell's enhanced DSA tag -net/dsa/tag_brcm.c: Broadcom's 4 bytes tag -net/dsa/tag_qca.c: Qualcomm's 2 bytes tag +- ``net/dsa/tag_trailer.c``: Marvell's 4 trailer tag mode (legacy) +- ``net/dsa/tag_dsa.c``: Marvell's original DSA tag +- ``net/dsa/tag_edsa.c``: Marvell's enhanced DSA tag +- ``net/dsa/tag_brcm.c``: Broadcom's 4 bytes tag +- ``net/dsa/tag_qca.c``: Qualcomm's 2 bytes tag The exact format of the tag protocol is vendor specific, but in general, they all contain something which: @@ -89,7 +87,7 @@ Master network devices are regular, unmodified Linux network device drivers for the CPU/management Ethernet interface. Such a driver might occasionally need to know whether DSA is enabled (e.g.: to enable/disable specific offload features), but the DSA subsystem has been proven to work with industry standard drivers: -e1000e, mv643xx_eth etc. without having to introduce modifications to these +``e1000e,`` ``mv643xx_eth`` etc. without having to introduce modifications to these drivers. Such network devices are also often referred to as conduit network devices since they act as a pipe between the host processor and the hardware Ethernet switch. @@ -100,40 +98,42 @@ Networking stack hooks When a master netdev is used with DSA, a small hook is placed in in the networking stack is in order to have the DSA subsystem process the Ethernet switch specific tagging protocol. DSA accomplishes this by registering a -specific (and fake) Ethernet type (later becoming skb->protocol) with the -networking stack, this is also known as a ptype or packet_type. A typical +specific (and fake) Ethernet type (later becoming ``skb->protocol``) with the +networking stack, this is also known as a ``ptype`` or ``packet_type``. A typical Ethernet Frame receive sequence looks like this: Master network device (e.g.: e1000e): -Receive interrupt fires: -- receive function is invoked -- basic packet processing is done: getting length, status etc. -- packet is prepared to be processed by the Ethernet layer by calling - eth_type_trans +1. Receive interrupt fires: + + - receive function is invoked + - basic packet processing is done: getting length, status etc. + - packet is prepared to be processed by the Ethernet layer by calling + ``eth_type_trans`` + +2. net/ethernet/eth.c:: + + eth_type_trans(skb, dev) + if (dev->dsa_ptr != NULL) + -> skb->protocol = ETH_P_XDSA -net/ethernet/eth.c: +3. drivers/net/ethernet/\*:: -eth_type_trans(skb, dev) - if (dev->dsa_ptr != NULL) - -> skb->protocol = ETH_P_XDSA + netif_receive_skb(skb) + -> iterate over registered packet_type + -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() -drivers/net/ethernet/*: +4. net/dsa/dsa.c:: -netif_receive_skb(skb) - -> iterate over registered packet_type - -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() + -> dsa_switch_rcv() + -> invoke switch tag specific protocol handler in 'net/dsa/tag_*.c' -net/dsa/dsa.c: - -> dsa_switch_rcv() - -> invoke switch tag specific protocol handler in - net/dsa/tag_*.c +5. net/dsa/tag_*.c: -net/dsa/tag_*.c: - -> inspect and strip switch tag protocol to determine originating port - -> locate per-port network device - -> invoke eth_type_trans() with the DSA slave network device - -> invoked netif_receive_skb() + - inspect and strip switch tag protocol to determine originating port + - locate per-port network device + - invoke ``eth_type_trans()`` with the DSA slave network device + - invoked ``netif_receive_skb()`` Past this point, the DSA slave network devices get delivered regular Ethernet frames that can be processed by the networking stack. @@ -162,7 +162,7 @@ invoke a specific transmit routine which takes care of adding the relevant switch tag in the Ethernet frames. These frames are then queued for transmission using the master network device -ndo_start_xmit() function, since they contain the appropriate switch tag, the +``ndo_start_xmit()`` function, since they contain the appropriate switch tag, the Ethernet switch will be able to process these incoming frames from the management interface and delivers these frames to the physical switch port. @@ -170,23 +170,25 @@ Graphical representation ------------------------ Summarized, this is basically how DSA looks like from a network device -perspective: - - - |--------------------------- - | CPU network device (eth0)| - ---------------------------- - | <tag added by switch | - | | - | | - | tag added by CPU> | - |--------------------------------------------| - | Switch driver | - |--------------------------------------------| - || || || - |-------| |-------| |-------| - | sw0p0 | | sw0p1 | | sw0p2 | - |-------| |-------| |-------| +perspective:: + + + |--------------------------- + | CPU network device (eth0)| + ---------------------------- + | <tag added by switch | + | | + | | + | tag added by CPU> | + |--------------------------------------------| + | Switch driver | + |--------------------------------------------| + || || || + |-------| |-------| |-------| + | sw0p0 | | sw0p1 | | sw0p2 | + |-------| |-------| |-------| + + Slave MDIO bus -------------- @@ -207,31 +209,32 @@ PHYs, external PHYs, or even external switches. Data structures --------------- -DSA data structures are defined in include/net/dsa.h as well as -net/dsa/dsa_priv.h. +DSA data structures are defined in ``include/net/dsa.h`` as well as +``net/dsa/dsa_priv.h``: -dsa_chip_data: platform data configuration for a given switch device, this -structure describes a switch device's parent device, its address, as well as -various properties of its ports: names/labels, and finally a routing table -indication (when cascading switches) +- ``dsa_chip_data``: platform data configuration for a given switch device, + this structure describes a switch device's parent device, its address, as + well as various properties of its ports: names/labels, and finally a routing + table indication (when cascading switches) -dsa_platform_data: platform device configuration data which can reference a -collection of dsa_chip_data structure if multiples switches are cascaded, the -master network device this switch tree is attached to needs to be referenced +- ``dsa_platform_data``: platform device configuration data which can reference + a collection of dsa_chip_data structure if multiples switches are cascaded, + the master network device this switch tree is attached to needs to be + referenced -dsa_switch_tree: structure assigned to the master network device under -"dsa_ptr", this structure references a dsa_platform_data structure as well as -the tagging protocol supported by the switch tree, and which receive/transmit -function hooks should be invoked, information about the directly attached switch -is also provided: CPU port. Finally, a collection of dsa_switch are referenced -to address individual switches in the tree. +- ``dsa_switch_tree``: structure assigned to the master network device under + ``dsa_ptr``, this structure references a dsa_platform_data structure as well as + the tagging protocol supported by the switch tree, and which receive/transmit + function hooks should be invoked, information about the directly attached + switch is also provided: CPU port. Finally, a collection of dsa_switch are + referenced to address individual switches in the tree. -dsa_switch: structure describing a switch device in the tree, referencing a -dsa_switch_tree as a backpointer, slave network devices, master network device, -and a reference to the backing dsa_switch_ops +- ``dsa_switch``: structure describing a switch device in the tree, referencing + a ``dsa_switch_tree`` as a backpointer, slave network devices, master network + device, and a reference to the backing``dsa_switch_ops`` -dsa_switch_ops: structure referencing function pointers, see below for a full -description. +- ``dsa_switch_ops``: structure referencing function pointers, see below for a + full description. Design limitations ================== @@ -240,7 +243,7 @@ Limits on the number of devices and ports ----------------------------------------- DSA currently limits the number of maximum switches within a tree to 4 -(DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS). +(``DSA_MAX_SWITCHES``), and the number of ports per switch to 12 (``DSA_MAX_PORTS``). These limits could be extended to support larger configurations would this need arise. @@ -279,15 +282,15 @@ Interactions with other subsystems DSA currently leverages the following subsystems: -- MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c -- Switchdev: net/switchdev/* +- MDIO/PHY library: ``drivers/net/phy/phy.c``, ``mdio_bus.c`` +- Switchdev:``net/switchdev/*`` - Device Tree for various of_* functions MDIO/PHY library ---------------- Slave network devices exposed by DSA may or may not be interfacing with PHY -devices (struct phy_device as defined in include/linux/phy.h), but the DSA +devices (``struct phy_device`` as defined in ``include/linux/phy.h)``, but the DSA subsystem deals with all possible combinations: - internal PHY devices, built into the Ethernet switch hardware @@ -296,16 +299,16 @@ subsystem deals with all possible combinations: - special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a fixed PHYs -The PHY configuration is done by the dsa_slave_phy_setup() function and the +The PHY configuration is done by the ``dsa_slave_phy_setup()`` function and the logic basically looks like this: - if Device Tree is used, the PHY device is looked up using the standard "phy-handle" property, if found, this PHY device is created and registered - using of_phy_connect() + using ``of_phy_connect()`` - if Device Tree is used, and the PHY device is "fixed", that is, conforms to the definition of a non-MDIO managed PHY as defined in - Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered + ``Documentation/devicetree/bindings/net/fixed-link.txt``, the PHY is registered and connected transparently using the special fixed MDIO bus driver - finally, if the PHY is built into the switch, as is very common with @@ -331,8 +334,8 @@ Device Tree ----------- DSA features a standardized binding which is documented in -Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper -functions such as of_get_phy_mode(), of_phy_connect() are also used to query +``Documentation/devicetree/bindings/net/dsa/dsa.txt``. PHY/MDIO library helper +functions such as ``of_get_phy_mode()``, ``of_phy_connect()`` are also used to query per-port PHY specific details: interface connection, MDIO bus location etc.. Driver development @@ -341,8 +344,8 @@ Driver development DSA switch drivers need to implement a dsa_switch_ops structure which will contain the various members described below. -register_switch_driver() registers this dsa_switch_ops in its internal list -of drivers to probe for. unregister_switch_driver() does the exact opposite. +``register_switch_driver()`` registers this dsa_switch_ops in its internal list +of drivers to probe for. ``unregister_switch_driver()`` does the exact opposite. Unless requested differently by setting the priv_size member accordingly, DSA does not allocate any driver private context space. @@ -350,17 +353,17 @@ does not allocate any driver private context space. Switch configuration -------------------- -- tag_protocol: this is to indicate what kind of tagging protocol is supported, - should be a valid value from the dsa_tag_protocol enum +- ``tag_protocol``: this is to indicate what kind of tagging protocol is supported, + should be a valid value from the ``dsa_tag_protocol`` enum -- probe: probe routine which will be invoked by the DSA platform device upon +- ``probe``: probe routine which will be invoked by the DSA platform device upon registration to test for the presence/absence of a switch device. For MDIO devices, it is recommended to issue a read towards internal registers using the switch pseudo-PHY and return whether this is a supported device. For other buses, return a non-NULL string -- setup: setup function for the switch, this function is responsible for setting - up the dsa_switch_ops private structure with all it needs: register maps, +- ``setup``: setup function for the switch, this function is responsible for setting + up the ``dsa_switch_ops`` private structure with all it needs: register maps, interrupts, mutexes, locks etc.. This function is also expected to properly configure the switch to separate all network interfaces from each other, that is, they should be isolated by the switch hardware itself, typically by creating @@ -375,27 +378,27 @@ Switch configuration PHY devices and link management ------------------------------- -- get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs, +- ``get_phy_flags``: Some switches are interfaced to various kinds of Ethernet PHYs, if the PHY library PHY driver needs to know about information it cannot obtain on its own (e.g.: coming from switch memory mapped registers), this function should return a 32-bits bitmask of "flags", that is private between the switch - driver and the Ethernet PHY driver in drivers/net/phy/*. + driver and the Ethernet PHY driver in ``drivers/net/phy/\*``. -- phy_read: Function invoked by the DSA slave MDIO bus when attempting to read +- ``phy_read``: Function invoked by the DSA slave MDIO bus when attempting to read the switch port MDIO registers. If unavailable, return 0xffff for each read. For builtin switch Ethernet PHYs, this function should allow reading the link status, auto-negotiation results, link partner pages etc.. -- phy_write: Function invoked by the DSA slave MDIO bus when attempting to write +- ``phy_write``: Function invoked by the DSA slave MDIO bus when attempting to write to the switch port MDIO registers. If unavailable return a negative error code. -- adjust_link: Function invoked by the PHY library when a slave network device +- ``adjust_link``: Function invoked by the PHY library when a slave network device is attached to a PHY device. This function is responsible for appropriately configuring the switch port link parameters: speed, duplex, pause based on - what the phy_device is providing. + what the ``phy_device`` is providing. -- fixed_link_update: Function invoked by the PHY library, and specifically by +- ``fixed_link_update``: Function invoked by the PHY library, and specifically by the fixed PHY driver asking the switch driver for link parameters that could not be auto-negotiated, or obtained by reading the PHY registers through MDIO. This is particularly useful for specific kinds of hardware such as QSGMII, @@ -405,87 +408,87 @@ PHY devices and link management Ethtool operations ------------------ -- get_strings: ethtool function used to query the driver's strings, will +- ``get_strings``: ethtool function used to query the driver's strings, will typically return statistics strings, private flags strings etc. -- get_ethtool_stats: ethtool function used to query per-port statistics and +- ``get_ethtool_stats``: ethtool function used to query per-port statistics and return their values. DSA overlays slave network devices general statistics: RX/TX counters from the network device, with switch driver specific statistics per port -- get_sset_count: ethtool function used to query the number of statistics items +- ``get_sset_count``: ethtool function used to query the number of statistics items -- get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this +- ``get_wol``: ethtool function used to obtain Wake-on-LAN settings per-port, this function may, for certain implementations also query the master network device Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN -- set_wol: ethtool function used to configure Wake-on-LAN settings per-port, +- ``set_wol``: ethtool function used to configure Wake-on-LAN settings per-port, direct counterpart to set_wol with similar restrictions -- set_eee: ethtool function which is used to configure a switch port EEE (Green +- ``set_eee``: ethtool function which is used to configure a switch port EEE (Green Ethernet) settings, can optionally invoke the PHY library to enable EEE at the PHY level if relevant. This function should enable EEE at the switch port MAC controller and data-processing logic -- get_eee: ethtool function which is used to query a switch port EEE settings, +- ``get_eee``: ethtool function which is used to query a switch port EEE settings, this function should return the EEE state of the switch port MAC controller and data-processing logic as well as query the PHY for its currently configured EEE settings -- get_eeprom_len: ethtool function returning for a given switch the EEPROM +- ``get_eeprom_len``: ethtool function returning for a given switch the EEPROM length/size in bytes -- get_eeprom: ethtool function returning for a given switch the EEPROM contents +- ``get_eeprom``: ethtool function returning for a given switch the EEPROM contents -- set_eeprom: ethtool function writing specified data to a given switch EEPROM +- ``set_eeprom``: ethtool function writing specified data to a given switch EEPROM -- get_regs_len: ethtool function returning the register length for a given +- ``get_regs_len``: ethtool function returning the register length for a given switch -- get_regs: ethtool function returning the Ethernet switch internal register +- ``get_regs``: ethtool function returning the Ethernet switch internal register contents. This function might require user-land code in ethtool to pretty-print register values and registers Power management ---------------- -- suspend: function invoked by the DSA platform device when the system goes to +- ``suspend``: function invoked by the DSA platform device when the system goes to suspend, should quiesce all Ethernet switch activities, but keep ports participating in Wake-on-LAN active as well as additional wake-up logic if supported -- resume: function invoked by the DSA platform device when the system resumes, +- ``resume``: function invoked by the DSA platform device when the system resumes, should resume all Ethernet switch activities and re-configure the switch to be in a fully active state -- port_enable: function invoked by the DSA slave network device ndo_open +- ``port_enable``: function invoked by the DSA slave network device ndo_open function when a port is administratively brought up, this function should be fully enabling a given switch port. DSA takes care of marking the port with - BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it + ``BR_STATE_BLOCKING`` if the port is a bridge member, or ``BR_STATE_FORWARDING`` if it was not, and propagating these changes down to the hardware -- port_disable: function invoked by the DSA slave network device ndo_close +- ``port_disable``: function invoked by the DSA slave network device ndo_close function when a port is administratively brought down, this function should be fully disabling a given switch port. DSA takes care of marking the port with - BR_STATE_DISABLED and propagating changes to the hardware if this port is + ``BR_STATE_DISABLED`` and propagating changes to the hardware if this port is disabled while being a bridge member Bridge layer ------------ -- port_bridge_join: bridge layer function invoked when a given switch port is +- ``port_bridge_join``: bridge layer function invoked when a given switch port is added to a bridge, this function should be doing the necessary at the switch level to permit the joining port from being added to the relevant logical domain for it to ingress/egress traffic with other members of the bridge. -- port_bridge_leave: bridge layer function invoked when a given switch port is +- ``port_bridge_leave``: bridge layer function invoked when a given switch port is removed from a bridge, this function should be doing the necessary at the switch level to deny the leaving port from ingress/egress traffic from the remaining bridge members. When the port leaves the bridge, it should be aged out at the switch hardware for the switch to (re) learn MAC addresses behind this port. -- port_stp_state_set: bridge layer function invoked when a given switch port STP +- ``port_stp_state_set``: bridge layer function invoked when a given switch port STP state is computed by the bridge layer and should be propagated to switch hardware to forward/block/learn traffic. The switch driver is responsible for computing a STP state change based on current and asked parameters and perform @@ -494,7 +497,7 @@ Bridge layer Bridge VLAN filtering --------------------- -- port_vlan_filtering: bridge layer function invoked when the bridge gets +- ``port_vlan_filtering``: bridge layer function invoked when the bridge gets configured for turning on or off VLAN filtering. If nothing specific needs to be done at the hardware level, this callback does not need to be implemented. When VLAN filtering is turned on, the hardware must be programmed with @@ -504,61 +507,61 @@ Bridge VLAN filtering accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are allowed. -- port_vlan_prepare: bridge layer function invoked when the bridge prepares the +- ``port_vlan_prepare``: bridge layer function invoked when the bridge prepares the configuration of a VLAN on the given port. If the operation is not supported - by the hardware, this function should return -EOPNOTSUPP to inform the bridge + by the hardware, this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a software implementation. No hardware setup must be done in this function. See port_vlan_add for this and details. -- port_vlan_add: bridge layer function invoked when a VLAN is configured +- ``port_vlan_add``: bridge layer function invoked when a VLAN is configured (tagged or untagged) for the given switch port -- port_vlan_del: bridge layer function invoked when a VLAN is removed from the +- ``port_vlan_del``: bridge layer function invoked when a VLAN is removed from the given switch port -- port_vlan_dump: bridge layer function invoked with a switchdev callback +- ``port_vlan_dump``: bridge layer function invoked with a switchdev callback function that the driver has to call for each VLAN the given port is a member of. A switchdev object is used to carry the VID and bridge flags. -- port_fdb_add: bridge layer function invoked when the bridge wants to install a +- ``port_fdb_add``: bridge layer function invoked when the bridge wants to install a Forwarding Database entry, the switch hardware should be programmed with the specified address in the specified VLAN Id in the forwarding database associated with this VLAN ID. If the operation is not supported, this - function should return -EOPNOTSUPP to inform the bridge code to fallback to + function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a software implementation. -Note: VLAN ID 0 corresponds to the port private database, which, in the context -of DSA, would be the its port-based VLAN, used by the associated bridge device. +.. note:: VLAN ID 0 corresponds to the port private database, which, in the context + of DSA, would be the its port-based VLAN, used by the associated bridge device. -- port_fdb_del: bridge layer function invoked when the bridge wants to remove a +- ``port_fdb_del``: bridge layer function invoked when the bridge wants to remove a Forwarding Database entry, the switch hardware should be programmed to delete the specified MAC address from the specified VLAN ID if it was mapped into this port forwarding database -- port_fdb_dump: bridge layer function invoked with a switchdev callback +- ``port_fdb_dump``: bridge layer function invoked with a switchdev callback function that the driver has to call for each MAC address known to be behind the given port. A switchdev object is used to carry the VID and FDB info. -- port_mdb_prepare: bridge layer function invoked when the bridge prepares the +- ``port_mdb_prepare``: bridge layer function invoked when the bridge prepares the installation of a multicast database entry. If the operation is not supported, - this function should return -EOPNOTSUPP to inform the bridge code to fallback + this function should return ``-EOPNOTSUPP`` to inform the bridge code to fallback to a software implementation. No hardware setup must be done in this function. - See port_fdb_add for this and details. + See ``port_fdb_add`` for this and details. -- port_mdb_add: bridge layer function invoked when the bridge wants to install +- ``port_mdb_add``: bridge layer function invoked when the bridge wants to install a multicast database entry, the switch hardware should be programmed with the specified address in the specified VLAN ID in the forwarding database associated with this VLAN ID. -Note: VLAN ID 0 corresponds to the port private database, which, in the context -of DSA, would be the its port-based VLAN, used by the associated bridge device. +.. note:: VLAN ID 0 corresponds to the port private database, which, in the context + of DSA, would be the its port-based VLAN, used by the associated bridge device. -- port_mdb_del: bridge layer function invoked when the bridge wants to remove a +- ``port_mdb_del``: bridge layer function invoked when the bridge wants to remove a multicast database entry, the switch hardware should be programmed to delete the specified MAC address from the specified VLAN ID if it was mapped into this port forwarding database. -- port_mdb_dump: bridge layer function invoked with a switchdev callback +- ``port_mdb_dump``: bridge layer function invoked with a switchdev callback function that the driver has to call for each MAC address known to be behind the given port. A switchdev object is used to carry the VID and MDB info. @@ -577,7 +580,7 @@ two subsystems and get the best of both worlds. Other hanging fruits -------------------- -- making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS +- making the number of ports fully dynamic and not dependent on ``DSA_MAX_PORTS`` - allowing more than one CPU/management interface: http://comments.gmane.org/gmane.linux.network/365657 - porting more drivers from other vendors: diff --git a/Documentation/networking/dsa/index.rst b/Documentation/networking/dsa/index.rst new file mode 100644 index 000000000000..0e5b7a9be406 --- /dev/null +++ b/Documentation/networking/dsa/index.rst @@ -0,0 +1,11 @@ +=============================== +Distributed Switch Architecture +=============================== + +.. toctree:: + :maxdepth: 1 + + dsa + bcm_sf2 + lan9303 + sja1105 diff --git a/Documentation/networking/dsa/lan9303.txt b/Documentation/networking/dsa/lan9303.rst index 144b02b95207..e3c820db28ad 100644 --- a/Documentation/networking/dsa/lan9303.txt +++ b/Documentation/networking/dsa/lan9303.rst @@ -1,3 +1,4 @@ +============================== LAN9303 Ethernet switch driver ============================== @@ -9,10 +10,9 @@ host master network interface (e.g. fixed link). Driver details ============== -The driver is implemented as a DSA driver, see -Documentation/networking/dsa/dsa.txt. +The driver is implemented as a DSA driver, see ``Documentation/networking/dsa/dsa.rst``. -See Documentation/devicetree/bindings/net/dsa/lan9303.txt for device tree +See ``Documentation/devicetree/bindings/net/dsa/lan9303.txt`` for device tree binding. The LAN9303 can be managed both via MDIO and I2C, both supported by this driver. diff --git a/Documentation/networking/dsa/sja1105.rst b/Documentation/networking/dsa/sja1105.rst new file mode 100644 index 000000000000..ea7bac438cfd --- /dev/null +++ b/Documentation/networking/dsa/sja1105.rst @@ -0,0 +1,220 @@ +========================= +NXP SJA1105 switch driver +========================= + +Overview +======== + +The NXP SJA1105 is a family of 6 devices: + +- SJA1105E: First generation, no TTEthernet +- SJA1105T: First generation, TTEthernet +- SJA1105P: Second generation, no TTEthernet, no SGMII +- SJA1105Q: Second generation, TTEthernet, no SGMII +- SJA1105R: Second generation, no TTEthernet, SGMII +- SJA1105S: Second generation, TTEthernet, SGMII + +These are SPI-managed automotive switches, with all ports being gigabit +capable, and supporting MII/RMII/RGMII and optionally SGMII on one port. + +Being automotive parts, their configuration interface is geared towards +set-and-forget use, with minimal dynamic interaction at runtime. They +require a static configuration to be composed by software and packed +with CRC and table headers, and sent over SPI. + +The static configuration is composed of several configuration tables. Each +table takes a number of entries. Some configuration tables can be (partially) +reconfigured at runtime, some not. Some tables are mandatory, some not: + +============================= ================== ============================= +Table Mandatory Reconfigurable +============================= ================== ============================= +Schedule no no +Schedule entry points if Scheduling no +VL Lookup no no +VL Policing if VL Lookup no +VL Forwarding if VL Lookup no +L2 Lookup no no +L2 Policing yes no +VLAN Lookup yes yes +L2 Forwarding yes partially (fully on P/Q/R/S) +MAC Config yes partially (fully on P/Q/R/S) +Schedule Params if Scheduling no +Schedule Entry Points Params if Scheduling no +VL Forwarding Params if VL Forwarding no +L2 Lookup Params no partially (fully on P/Q/R/S) +L2 Forwarding Params yes no +Clock Sync Params no no +AVB Params no no +General Params yes partially +Retagging no yes +xMII Params yes no +SGMII no yes +============================= ================== ============================= + + +Also the configuration is write-only (software cannot read it back from the +switch except for very few exceptions). + +The driver creates a static configuration at probe time, and keeps it at +all times in memory, as a shadow for the hardware state. When required to +change a hardware setting, the static configuration is also updated. +If that changed setting can be transmitted to the switch through the dynamic +reconfiguration interface, it is; otherwise the switch is reset and +reprogrammed with the updated static configuration. + +Traffic support +=============== + +The switches do not support switch tagging in hardware. But they do support +customizing the TPID by which VLAN traffic is identified as such. The switch +driver is leveraging ``CONFIG_NET_DSA_TAG_8021Q`` by requesting that special +VLANs (with a custom TPID of ``ETH_P_EDSA`` instead of ``ETH_P_8021Q``) are +installed on its ports when not in ``vlan_filtering`` mode. This does not +interfere with the reception and transmission of real 802.1Q-tagged traffic, +because the switch does no longer parse those packets as VLAN after the TPID +change. +The TPID is restored when ``vlan_filtering`` is requested by the user through +the bridge layer, and general IP termination becomes no longer possible through +the switch netdevices in this mode. + +The switches have two programmable filters for link-local destination MACs. +These are used to trap BPDUs and PTP traffic to the master netdevice, and are +further used to support STP and 1588 ordinary clock/boundary clock +functionality. + +The following traffic modes are supported over the switch netdevices: + ++--------------------+------------+------------------+------------------+ +| | Standalone | Bridged with | Bridged with | +| | ports | vlan_filtering 0 | vlan_filtering 1 | ++====================+============+==================+==================+ +| Regular traffic | Yes | Yes | No (use master) | ++--------------------+------------+------------------+------------------+ +| Management traffic | Yes | Yes | Yes | +| (BPDU, PTP) | | | | ++--------------------+------------+------------------+------------------+ + +Switching features +================== + +The driver supports the configuration of L2 forwarding rules in hardware for +port bridging. The forwarding, broadcast and flooding domain between ports can +be restricted through two methods: either at the L2 forwarding level (isolate +one bridge's ports from another's) or at the VLAN port membership level +(isolate ports within the same bridge). The final forwarding decision taken by +the hardware is a logical AND of these two sets of rules. + +The hardware tags all traffic internally with a port-based VLAN (pvid), or it +decodes the VLAN information from the 802.1Q tag. Advanced VLAN classification +is not possible. Once attributed a VLAN tag, frames are checked against the +port's membership rules and dropped at ingress if they don't match any VLAN. +This behavior is available when switch ports are enslaved to a bridge with +``vlan_filtering 1``. + +Normally the hardware is not configurable with respect to VLAN awareness, but +by changing what TPID the switch searches 802.1Q tags for, the semantics of a +bridge with ``vlan_filtering 0`` can be kept (accept all traffic, tagged or +untagged), and therefore this mode is also supported. + +Segregating the switch ports in multiple bridges is supported (e.g. 2 + 2), but +all bridges should have the same level of VLAN awareness (either both have +``vlan_filtering`` 0, or both 1). Also an inevitable limitation of the fact +that VLAN awareness is global at the switch level is that once a bridge with +``vlan_filtering`` enslaves at least one switch port, the other un-bridged +ports are no longer available for standalone traffic termination. + +Topology and loop detection through STP is supported. + +L2 FDB manipulation (add/delete/dump) is currently possible for the first +generation devices. Aging time of FDB entries, as well as enabling fully static +management (no address learning and no flooding of unknown traffic) is not yet +configurable in the driver. + +A special comment about bridging with other netdevices (illustrated with an +example): + +A board has eth0, eth1, swp0@eth1, swp1@eth1, swp2@eth1, swp3@eth1. +The switch ports (swp0-3) are under br0. +It is desired that eth0 is turned into another switched port that communicates +with swp0-3. + +If br0 has vlan_filtering 0, then eth0 can simply be added to br0 with the +intended results. +If br0 has vlan_filtering 1, then a new br1 interface needs to be created that +enslaves eth0 and eth1 (the DSA master of the switch ports). This is because in +this mode, the switch ports beneath br0 are not capable of regular traffic, and +are only used as a conduit for switchdev operations. + +Device Tree bindings and board design +===================================== + +This section references ``Documentation/devicetree/bindings/net/dsa/sja1105.txt`` +and aims to showcase some potential switch caveats. + +RMII PHY role and out-of-band signaling +--------------------------------------- + +In the RMII spec, the 50 MHz clock signals are either driven by the MAC or by +an external oscillator (but not by the PHY). +But the spec is rather loose and devices go outside it in several ways. +Some PHYs go against the spec and may provide an output pin where they source +the 50 MHz clock themselves, in an attempt to be helpful. +On the other hand, the SJA1105 is only binary configurable - when in the RMII +MAC role it will also attempt to drive the clock signal. To prevent this from +happening it must be put in RMII PHY role. +But doing so has some unintended consequences. +In the RMII spec, the PHY can transmit extra out-of-band signals via RXD[1:0]. +These are practically some extra code words (/J/ and /K/) sent prior to the +preamble of each frame. The MAC does not have this out-of-band signaling +mechanism defined by the RMII spec. +So when the SJA1105 port is put in PHY role to avoid having 2 drivers on the +clock signal, inevitably an RMII PHY-to-PHY connection is created. The SJA1105 +emulates a PHY interface fully and generates the /J/ and /K/ symbols prior to +frame preambles, which the real PHY is not expected to understand. So the PHY +simply encodes the extra symbols received from the SJA1105-as-PHY onto the +100Base-Tx wire. +On the other side of the wire, some link partners might discard these extra +symbols, while others might choke on them and discard the entire Ethernet +frames that follow along. This looks like packet loss with some link partners +but not with others. +The take-away is that in RMII mode, the SJA1105 must be let to drive the +reference clock if connected to a PHY. + +RGMII fixed-link and internal delays +------------------------------------ + +As mentioned in the bindings document, the second generation of devices has +tunable delay lines as part of the MAC, which can be used to establish the +correct RGMII timing budget. +When powered up, these can shift the Rx and Tx clocks with a phase difference +between 73.8 and 101.7 degrees. +The catch is that the delay lines need to lock onto a clock signal with a +stable frequency. This means that there must be at least 2 microseconds of +silence between the clock at the old vs at the new frequency. Otherwise the +lock is lost and the delay lines must be reset (powered down and back up). +In RGMII the clock frequency changes with link speed (125 MHz at 1000 Mbps, 25 +MHz at 100 Mbps and 2.5 MHz at 10 Mbps), and link speed might change during the +AN process. +In the situation where the switch port is connected through an RGMII fixed-link +to a link partner whose link state life cycle is outside the control of Linux +(such as a different SoC), then the delay lines would remain unlocked (and +inactive) until there is manual intervention (ifdown/ifup on the switch port). +The take-away is that in RGMII mode, the switch's internal delays are only +reliable if the link partner never changes link speeds, or if it does, it does +so in a way that is coordinated with the switch port (practically, both ends of +the fixed-link are under control of the same Linux system). +As to why would a fixed-link interface ever change link speeds: there are +Ethernet controllers out there which come out of reset in 100 Mbps mode, and +their driver inevitably needs to change the speed and clock frequency if it's +required to work at gigabit. + +MDIO bus and PHY management +--------------------------- + +The SJA1105 does not have an MDIO bus and does not perform in-band AN either. +Therefore there is no link state notification coming from the switch device. +A board would need to hook up the PHYs connected to the switch to any other +MDIO bus available to Linux within the system (e.g. to the DSA master's MDIO +bus). Link state management then works by the driver manually keeping in sync +(over SPI commands) the MAC link speed with the settings negotiated by the PHY. diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst index 5449149be496..f390fe3cfdfb 100644 --- a/Documentation/networking/index.rst +++ b/Documentation/networking/index.rst @@ -24,6 +24,7 @@ Contents: device_drivers/intel/i40e device_drivers/intel/iavf device_drivers/intel/ice + dsa/index devlink-info-versions ieee802154 kapi diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt index acdfb5d2bcaa..14fe93049d28 100644 --- a/Documentation/networking/ip-sysctl.txt +++ b/Documentation/networking/ip-sysctl.txt @@ -81,6 +81,11 @@ fib_multipath_hash_policy - INTEGER 0 - Layer 3 1 - Layer 4 +fib_sync_mem - UNSIGNED INTEGER + Amount of dirty memory from fib entries that can be backlogged before + synchronize_rcu is forced. + Default: 512kB Minimum: 64kB Maximum: 64MB + ip_forward_update_priority - INTEGER Whether to update SKB priority from "TOS" field in IPv4 header after it is forwarded. The new SKB priority is mapped from TOS field value @@ -422,6 +427,7 @@ tcp_min_rtt_wlen - INTEGER minimum RTT when it is moved to a longer path (e.g., due to traffic engineering). A longer window makes the filter more resistant to RTT inflations such as transient congestion. The unit is seconds. + Possible values: 0 - 86400 (1 day) Default: 300 tcp_moderate_rcvbuf - BOOLEAN @@ -554,10 +560,10 @@ tcp_comp_sack_delay_ns - LONG INTEGER Default : 1,000,000 ns (1 ms) tcp_comp_sack_nr - INTEGER - Max numer of SACK that can be compressed. + Max number of SACK that can be compressed. Using 0 disables SACK compression. - Detault : 44 + Default : 44 tcp_slow_start_after_idle - BOOLEAN If set, provide RFC2861 behavior and time out the congestion @@ -1336,6 +1342,7 @@ tag - INTEGER Default value is 0. xfrm4_gc_thresh - INTEGER + (Obsolete since linux-4.14) The threshold at which we will start garbage collecting for IPv4 destination cache entries. At twice this value the system will refuse new allocations. @@ -1908,17 +1915,43 @@ enhanced_dad - BOOLEAN icmp/*: ratelimit - INTEGER - Limit the maximal rates for sending ICMPv6 packets. + Limit the maximal rates for sending ICMPv6 messages. 0 to disable any limiting, otherwise the minimal space between responses in milliseconds. Default: 1000 +ratemask - list of comma separated ranges + For ICMPv6 message types matching the ranges in the ratemask, limit + the sending of the message according to ratelimit parameter. + + The format used for both input and output is a comma separated + list of ranges (e.g. "0-127,129" for ICMPv6 message type 0 to 127 and + 129). Writing to the file will clear all previous ranges of ICMPv6 + message types and update the current list with the input. + + Refer to: https://www.iana.org/assignments/icmpv6-parameters/icmpv6-parameters.xhtml + for numerical values of ICMPv6 message types, e.g. echo request is 128 + and echo reply is 129. + + Default: 0-1,3-127 (rate limit ICMPv6 errors except Packet Too Big) + echo_ignore_all - BOOLEAN If set non-zero, then the kernel will ignore all ICMP ECHO requests sent to it over the IPv6 protocol. Default: 0 +echo_ignore_multicast - BOOLEAN + If set non-zero, then the kernel will ignore all ICMP ECHO + requests sent to it over the IPv6 protocol via multicast. + Default: 0 + +echo_ignore_anycast - BOOLEAN + If set non-zero, then the kernel will ignore all ICMP ECHO + requests sent to it over the IPv6 protocol destined to anycast address. + Default: 0 + xfrm6_gc_thresh - INTEGER + (Obsolete since linux-4.14) The threshold at which we will start garbage collecting for IPv6 destination cache entries. At twice this value the system will refuse new allocations. diff --git a/Documentation/networking/msg_zerocopy.rst b/Documentation/networking/msg_zerocopy.rst index 18c1415e7bfa..ace56204dd03 100644 --- a/Documentation/networking/msg_zerocopy.rst +++ b/Documentation/networking/msg_zerocopy.rst @@ -50,7 +50,7 @@ the excellent reporting over at LWN.net or read the original code. patchset [PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY - http://lkml.kernel.org/r/20170803202945.70750-1-willemdebruijn.kernel@gmail.com + https://lkml.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com Interface diff --git a/Documentation/networking/netdev-FAQ.rst b/Documentation/networking/netdev-FAQ.rst index 0ac5fa77f501..642fa963be3c 100644 --- a/Documentation/networking/netdev-FAQ.rst +++ b/Documentation/networking/netdev-FAQ.rst @@ -131,6 +131,19 @@ it to the maintainer to figure out what is the most recent and current version that should be applied. If there is any doubt, the maintainer will reply and ask what should be done. +Q: I made changes to only a few patches in a patch series should I resend only those changed? +--------------------------------------------------------------------------------------------- +A: No, please resend the entire patch series and make sure you do number your +patches such that it is clear this is the latest and greatest set of patches +that can be applied. + +Q: I submitted multiple versions of a patch series and it looks like a version other than the last one has been accepted, what should I do? +------------------------------------------------------------------------------------------------------------------------------------------- +A: There is no revert possible, once it is pushed out, it stays like that. +Please send incremental versions on top of what has been merged in order to fix +the patches the way they would look like if your latest patch series was to be +merged. + Q: How can I tell what patches are queued up for backporting to the various stable releases? -------------------------------------------------------------------------------------------- A: Normally Greg Kroah-Hartman collects stable commits himself, but for diff --git a/Documentation/networking/nf_flowtable.txt b/Documentation/networking/nf_flowtable.txt index 54128c50d508..ca2136c76042 100644 --- a/Documentation/networking/nf_flowtable.txt +++ b/Documentation/networking/nf_flowtable.txt @@ -44,10 +44,10 @@ including the Netfilter hooks and the flowtable fastpath bypass. / \ / \ |Routing | / \ --> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit \_________/ \__________/ ---------- \____________/ ^ - | ^ | | ^ | - flowtable | | ____\/___ | | - | | | / \ | | - __\/___ | --------->| forward |------------ | + | ^ | ^ | + flowtable | ____\/___ | | + | | / \ | | + __\/___ | | forward |------------ | |-----| | \_________/ | |-----| | 'flow offload' rule | |-----| | adds entry to | diff --git a/Documentation/networking/rxrpc.txt b/Documentation/networking/rxrpc.txt index 2df5894353d6..180e07d956a7 100644 --- a/Documentation/networking/rxrpc.txt +++ b/Documentation/networking/rxrpc.txt @@ -796,7 +796,9 @@ The kernel interface functions are as follows: s64 tx_total_len, gfp_t gfp, rxrpc_notify_rx_t notify_rx, - bool upgrade); + bool upgrade, + bool intr, + unsigned int debug_id); This allocates the infrastructure to make a new RxRPC call and assigns call and connection numbers. The call will be made on the UDP port that @@ -824,6 +826,13 @@ The kernel interface functions are as follows: the server upgrade the service to a better one. The resultant service ID is returned by rxrpc_kernel_recv_data(). + intr should be set to true if the call should be interruptible. If this + is not set, this function may not return until a channel has been + allocated; if it is set, the function may return -ERESTARTSYS. + + debug_id is the call debugging ID to be used for tracing. This can be + obtained by atomically incrementing rxrpc_debug_id. + If this function is successful, an opaque reference to the RxRPC call is returned. The caller now holds a reference on this and it must be properly ended. @@ -1009,16 +1018,18 @@ The kernel interface functions are as follows: (*) Check call still alive. - u32 rxrpc_kernel_check_life(struct socket *sock, - struct rxrpc_call *call); + bool rxrpc_kernel_check_life(struct socket *sock, + struct rxrpc_call *call, + u32 *_life); void rxrpc_kernel_probe_life(struct socket *sock, struct rxrpc_call *call); - The first function returns a number that is updated when ACKs are received - from the peer (notably including PING RESPONSE ACKs which we can elicit by - sending PING ACKs to see if the call still exists on the server). The - caller should compare the numbers of two calls to see if the call is still - alive after waiting for a suitable interval. + The first function passes back in *_life a number that is updated when + ACKs are received from the peer (notably including PING RESPONSE ACKs + which we can elicit by sending PING ACKs to see if the call still exists + on the server). The caller should compare the numbers of two calls to see + if the call is still alive after waiting for a suitable interval. It also + returns true as long as the call hasn't yet reached the completed state. This allows the caller to work out if the server is still contactable and if the call is still alive on the server while waiting for the server to @@ -1054,6 +1065,16 @@ The kernel interface functions are as follows: This value can be used to determine if the remote client has been restarted as it shouldn't change otherwise. + (*) Set the maxmimum lifespan on a call. + + void rxrpc_kernel_set_max_life(struct socket *sock, + struct rxrpc_call *call, + unsigned long hard_timeout) + + This sets the maximum lifespan on a call to hard_timeout (which is in + jiffies). In the event of the timeout occurring, the call will be + aborted and -ETIME or -ETIMEDOUT will be returned. + ======================= CONFIGURABLE PARAMETERS diff --git a/Documentation/networking/segmentation-offloads.rst b/Documentation/networking/segmentation-offloads.rst index 89d1ee933e9f..085e8fab03fd 100644 --- a/Documentation/networking/segmentation-offloads.rst +++ b/Documentation/networking/segmentation-offloads.rst @@ -18,7 +18,7 @@ The following technologies are described: * Generic Segmentation Offload - GSO * Generic Receive Offload - GRO * Partial Generic Segmentation Offload - GSO_PARTIAL - * SCTP accelleration with GSO - GSO_BY_FRAGS + * SCTP acceleration with GSO - GSO_BY_FRAGS TCP Segmentation Offload @@ -148,7 +148,7 @@ that the IPv4 ID field is incremented in the case that a given header does not have the DF bit set. -SCTP accelleration with GSO +SCTP acceleration with GSO =========================== SCTP - despite the lack of hardware support - can still take advantage of diff --git a/Documentation/networking/snmp_counter.rst b/Documentation/networking/snmp_counter.rst index 52b026be028f..38a4edc4522b 100644 --- a/Documentation/networking/snmp_counter.rst +++ b/Documentation/networking/snmp_counter.rst @@ -413,7 +413,7 @@ algorithm. .. _F-RTO: https://tools.ietf.org/html/rfc5682 TCP Fast Path -============ +============= When kernel receives a TCP packet, it has two paths to handler the packet, one is fast path, another is slow path. The comment in kernel code provides a good explanation of them, I pasted them below:: @@ -681,6 +681,7 @@ The TCP stack receives an out of order duplicate packet, so it sends a DSACK to the sender. * TcpExtTCPDSACKRecv + The TCP stack receives a DSACK, which indicates an acknowledged duplicate packet is received. @@ -690,7 +691,7 @@ The TCP stack receives a DSACK, which indicate an out of order duplicate packet is received. invalid SACK and DSACK -==================== +====================== When a SACK (or DSACK) block is invalid, a corresponding counter would be updated. The validation method is base on the start/end sequence number of the SACK block. For more details, please refer the comment @@ -704,11 +705,13 @@ explaination: .. _Add counters for discarded SACK blocks: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=18f02545a9a16c9a89778b91a162ad16d510bb32 * TcpExtTCPSACKDiscard + This counter indicates how many SACK blocks are invalid. If the invalid SACK block is caused by ACK recording, the TCP stack will only ignore it and won't update this counter. * TcpExtTCPDSACKIgnoredOld and TcpExtTCPDSACKIgnoredNoUndo + When a DSACK block is invalid, one of these two counters would be updated. Which counter will be updated depends on the undo_marker flag of the TCP socket. If the undo_marker is not set, the TCP stack isn't @@ -719,7 +722,7 @@ will be updated. If the undo_marker is set, TcpExtTCPDSACKIgnoredOld will be updated. As implied in its name, it might be an old packet. SACK shift -========= +========== The linux networking stack stores data in sk_buff struct (skb for short). If a SACK block acrosses multiple skb, the TCP stack will try to re-arrange data in these skb. E.g. if a SACK block acknowledges seq @@ -730,12 +733,15 @@ seq 14 to 20. All data in skb2 will be moved to skb1, and skb2 will be discard, this operation is 'merge'. * TcpExtTCPSackShifted + A skb is shifted * TcpExtTCPSackMerged + A skb is merged * TcpExtTCPSackShiftFallback + A skb should be shifted or merged, but the TCP stack doesn't do it for some reasons. |