summaryrefslogtreecommitdiffstats
path: root/include/net
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: route: dissect flow in input path if fib rules need itRoopa Prabhu2018-03-012-1/+27
| | | | | | | | | | | | | | | Dissect flow in fwd path if fib rules require it. Controlled by a flag to avoid penatly for the common case. Flag is set when fib rules with sport, dport and proto match that require flow dissect are installed. Also passes the dissected hash keys to the multipath hash function when applicable to avoid dissecting the flow again. icmp packets will continue to use inner header for hash calculations (Thanks to Nikolay Aleksandrov for some review here). Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fib_rules: support for match on ip_proto, sport and dportRoopa Prabhu2018-03-011-1/+35
| | | | | | | | | uapi for ip_proto, sport and dport range match in fib rules. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sch: prio: Add offload ability for grafting a childNogah Frankel2018-02-281-0/+8
| | | | | | | | | | | | Offload sch_prio graft command for capable drivers. Warn in case of a failure, unless the graft was done as part of a destroy operation (the new qdisc is a noop) or if all the qdiscs (the parent, the old child, and the new one) are not offloaded. Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Reviewed-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* inet: whitespace cleanupStephen Hemminger2018-02-287-26/+25Star
| | | | | | | | | Ran simple script to find/remove trailing whitespace and blank lines at EOF because that kind of stuff git whines about and editors leave behind. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip_tunnel: Rename & publish init_tunnel_flowPetr Machata2018-02-271-0/+16
| | | | | | | | | | | | Initializing struct flowi4 is useful for drivers that need to emulate routing decisions made by a tunnel interface. Publish the function (appropriately renamed) so that the drivers in question don't need to cut'n'paste it around. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: GRE: Add is_gretap_dev, is_ip6gretap_devPetr Machata2018-02-271-0/+3
| | | | | | | | | | | | | | Determining whether a device is a GRE device is easily done by inspecting struct net_device.type. However, for the tap variants, the type is just ARPHRD_ETHER. Therefore introduce two predicate functions that use netdev_ops to tell the tap devices. Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-02-242-2/+2
|\
| * Merge tag 'mac80211-for-davem-2018-02-22' of ↵David S. Miller2018-02-222-2/+2
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes across the tree, the shortlog basically says it all: cfg80211: fix cfg80211_beacon_dup -> old bug in this code cfg80211: clear wep keys after disconnection -> certain ways of disconnecting left the keys mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4 -> alignment issues with using 14 bytes mac80211: Do not disconnect on invalid operating class -> if the AP has a bogus operating class, let it be mac80211: Fix sending ADDBA response for an ongoing session -> don't send the same frame twice cfg80211: use only 1Mbps for basic rates in mesh -> interop issue with old versions of our code mac80211_hwsim: don't use WQ_MEM_RECLAIM -> it causes splats because it flushes work on a non-reclaim WQ regulatory: add NUL to request alpha2 -> nla_put_string() issue from Kees mac80211: mesh: fix wrong mesh TTL offset calculation -> protocol issue mac80211: fix a possible leak of station stats -> error path might leak memory mac80211: fix calling sleeping function in atomic context -> percpu allocations need to be made with gfp flags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * regulatory: add NUL to request alpha2Johannes Berg2018-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to the ancient commit a5fe8e7695dc ("regulatory: add NUL to alpha2"), add another byte to alpha2 in the request struct so that when we use nla_put_string(), we don't overrun anything. Fixes: 73d54c9e74c4 ("cfg80211: add regulatory netlink multicast group") Reported-by: Kees Cook <keescook@google.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4Felix Fietkau2018-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that mac80211 allocated management frames are properly aligned, which makes copying them more efficient. For instance, mt76 uses iowrite32_copy to copy beacon frames to beacon template memory on the chip. Misaligned 32-bit accesses cause CPU exceptions on MIPS and should be avoided. Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | net: fib_rules: Add new attribute to set protocolDonald Sharp2018-02-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For ages iproute2 has used `struct rtmsg` as the ancillary header for FIB rules and in the process set the protocol value to RTPROT_BOOT. Until ca56209a66 ("net: Allow a rule to track originating protocol") the kernel rules code ignored the protocol value sent from userspace and always returned 0 in notifications. To avoid incompatibility with existing iproute2, send the protocol as a new attribute. Fixes: cac56209a66 ("net: Allow a rule to track originating protocol") Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge tag 'mac80211-next-for-davem-2018-02-22' of ↵David S. Miller2018-02-223-4/+121
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== Various updates across wireless. One thing to note: I've included a new ethertype that wireless uses (ETH_P_PREAUTH) in if_ether.h. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | mac80211: Call mgd_prep_tx before transmitting deauthenticationIlan Peer2018-02-221-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In multi channel scenarios, when disassociating from the AP before a beacon was heard from the AP, it is not guaranteed that the virtual interface is granted air time for the transmission of the deauthentication frame. This in turn can lead to various issues as the AP might never get the deauthentication frame. To mitigate such possible issues, add a HW flag indicating that the driver requires mac80211 to call the mgd_prep_tx() driver callback to make sure that the virtual interface is granted immediate airtime to be able to transmit the frame, in case that no beacon was heard from the AP. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | mac80211: support reporting A-MPDU EOF bit value/knownJohannes Berg2018-02-222-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Support getting the EOF bit value reported from hardware and writing it out to radiotap. Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | mac80211: Add tx ack signal support in sta infoVenkateswara Naralasetty2018-02-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allows users to get ack signal strength of last transmitted frame. Signed-off-by: Venkateswara Naralasetty <vnaralas@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | cfg80211: send ack_signal to user in probe client responseVenkateswara Naralasetty2018-02-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides support to get ack signal in probe client response and in station info from user. Signed-off-by: Venkateswara Naralasetty <vnaralas@codeaurora.org> [squash in compilation fixes] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | cfg80211: Add support to notify station's opmode change to userspacetamizhr@codeaurora.org2018-01-311-0/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ht/vht action frames will be sent to AP from station to notify change of its ht/vht opmode(max bandwidth, smps mode or nss) modified values. Currently these valuse used by driver/firmware for rate control algorithm. This patch introduces NL80211_CMD_STA_OPMODE_CHANGED command to notify those modified/current supported values(max bandwidth, smps mode, max nss) to userspace application. This will be useful for the application like steering, which closely monitoring station's capability changes. Since the application has taken these values during station association. Signed-off-by: Tamizh chelvam <tamizhr@codeaurora.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| * | | cfg80211/nl80211: Optional authentication offload to userspaceSrinivas Dasari2018-01-311-3/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This interface allows the host driver to offload the authentication to user space. This is exclusively defined for host drivers that do not define separate commands for authentication and association, but rely on userspace SME (e.g., in wpa_supplicant for the ~WPA_DRIVER_FLAGS_SME case) for the authentication to happen. This can be used to implement SAE without full implementation in the kernel/firmware while still being able to use NL80211_CMD_CONNECT with driver-based BSS selection. Host driver sends NL80211_CMD_EXTERNAL_AUTH event to start/abort authentication to the port on which connect is triggered and status of authentication is further indicated by user space to host driver through the same command response interface. User space entities advertise this capability through the NL80211_ATTR_EXTERNAL_AUTH_SUPP flag in the NL80211_CMD_CONNECT request. Host drivers shall look at this capability to offload the authentication. Signed-off-by: Srinivas Dasari <dasaris@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com> [add socket connection ownership check] Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | net: Allow a rule to track originating protocolDonald Sharp2018-02-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow a rule that is being added/deleted/modified or dumped to contain the originating protocol's id. The protocol is handled just like a routes originating protocol is. This is especially useful because there is starting to be a plethora of different user space programs adding rules. Allow the vrf device to specify that the kernel is the originator of the rule created for this device. Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: remove the hardcode in the definition of TCPF MacroYafang Shao2018-02-211-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCPF_ macro depends on the definition of TCP_ macro. So it is better to define them with TCP_ marco. Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: remove sk_check_csum_caps()Eric Dumazet2018-02-211-9/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Since TCP relies on GSO, we do not need this helper anymore. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: switch to GSO being always onEric Dumazet2018-02-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleksandr Natalenko reported performance issues with BBR without FQ packet scheduler that were root caused to lack of SG and GSO/TSO on his configuration. In this mode, TCP internal pacing has to setup a high resolution timer for each MSS sent. We could implement in TCP a strategy similar to the one adopted in commit fefa569a9d4b ("net_sched: sch_fq: account for schedule/timers drifts") or decide to finally switch TCP stack to a GSO only mode. This has many benefits : 1) Most TCP developments are done with TSO in mind. 2) Less high-resolution timers needs to be armed for TCP-pacing 3) GSO can benefit of xmit_more hint 4) Receiver GRO is more effective (as if TSO was used for real on sender) -> Lower ACK traffic 5) Write queues have less overhead (one skb holds about 64KB of payload) 6) SACK coalescing just works. 7) rtx rb-tree contains less packets, SACK is cheaper. This patch implements the minimum patch, but we can remove some legacy code as follow ups. Tested: On 40Gbit link, one netperf -t TCP_STREAM BBR+fq: sg on: 26 Gbits/sec sg off: 15.7 Gbits/sec (was 2.3 Gbit before patch) BBR+pfifo_fast: sg on: 24.2 Gbits/sec sg off: 14.9 Gbits/sec (was 0.66 Gbit before patch !!! ) BBR+fq_codel: sg on: 24.4 Gbits/sec sg off: 15 Gbits/sec (was 0.66 Gbit before patch !!! ) Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/mac8390: Convert to nubus_driverFinn Thain2018-02-211-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This resolves an old bug that constrained this driver to no more than one card. Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | devlink: Perform cleanup of resource_set cbArkadi Sharshevsky2018-02-201-4/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After adding size validation logic into core cleanup is required. Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Make cleanup_list and net::cleanup_list of llist typeKirill Tkhai2018-02-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies cleanup queueing and makes cleanup lists to use llist primitives. Since llist has its own cmpxchg() ordering, cleanup_list_lock is not more need. Also, struct llist_node is smaller, than struct list_head, so we save some bytes in struct net with this patch. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Kill net_mutexKirill Tkhai2018-02-201-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We take net_mutex, when there are !async pernet_operations registered, and read locking of net_sem is not enough. But we may get rid of taking the mutex, and just change the logic to write lock net_sem in such cases. This obviously reduces the number of lock operations, we do. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2018-02-201-0/+1
|\ \ \ \ | | |/ / | |/| |
| * | | udplite: fix partial checksum initializationAlexey Kodanev2018-02-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since UDP-Lite is always using checksum, the following path is triggered when calculating pseudo header for it: udp4_csum_init() or udp6_csum_init() skb_checksum_init_zero_check() __skb_checksum_validate_complete() The problem can appear if skb->len is less than CHECKSUM_BREAK. In this particular case __skb_checksum_validate_complete() also invokes __skb_checksum_complete(skb). If UDP-Lite is using partial checksum that covers only part of a packet, the function will return bad checksum and the packet will be dropped. It can be fixed if we skip skb_checksum_init_zero_check() and only set the required pseudo header checksum for UDP-Lite with partial checksum before udp4_csum_init()/udp6_csum_init() functions return. Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4") Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6") Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: handle extack in tcf_generic_walkerAlexander Aring2018-02-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack handling for a common used TC act function "tcf_generic_walker()" to add an extack message on failures. The tcf_generic_walker() function can fail if get a invalid command different than DEL and GET. The naming "action" here is wrong, the correct naming would be command. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: add extack for walk callbackAlexander Aring2018-02-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack support for act walker callback api. This prepares to handle extack support inside each specific act implementation. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: add extack for lookup callbackAlexander Aring2018-02-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack support for act lookup callback api. This prepares to handle extack support inside each specific act implementation. Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: add extack to init callbackAlexander Aring2018-02-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack support for act init callback api. This prepares to handle extack support inside each specific act implementation. Based on work by David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: add extack to initAlexander Aring2018-02-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack to tcf_action_init and tcf_action_init_1 functions. These are necessary to make individual extack handling in each act implementation. Based on work by David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: fix code styleAlexander Aring2018-02-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is used by subsequent patches. It fixes code style issues caught by checkpatch. Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Revert sched action extack support series.David S. Miller2018-02-161-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | It was mis-applied and the changes had rejects. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: add extack to initAlexander Aring2018-02-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds extack to tcf_action_init and tcf_action_init_1 functions. These are necessary to make individual extack handling in each act implementation. Based on work by David Ahern <dsahern@gmail.com> Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: sched: act: fix code styleAlexander Aring2018-02-161-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is used by subsequent patches. It fixes code style issues caught by checkpatch. Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net/ipv4: Remove fib table id from rtableDavid Ahern2018-02-151-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove rt_table_id from rtable. It was added for getroute to return the table id that was hit in the lookup. With the changes for fibmatch the table id can be extracted from the fib_info returned in the fib_result so it no longer needs to be in rtable directly. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Move ipv4 set_lwt_redirect helper to lwtunnelDavid Ahern2018-02-141-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IPv4 uses set_lwt_redirect to set the lwtunnel redirect functions as needed. Move it to lwtunnel.h as lwtunnel_set_redirect and change IPv6 to also use it. Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: dsa: forward timestamping callbacks to switch driversBrandon Streiff2018-02-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Forward the rx/tx timestamp machinery from the dsa infrastructure to the switch driver. On the rx side, defer delivery of skbs until we have an rx timestamp. This mimicks the behavior of skb_defer_rx_timestamp. On the tx side, identify PTP packets, clone them, and pass them to the underlying switch driver before we transmit. This mimicks the behavior of skb_tx_timestamp. Adjusted txstamp API to keep the allocation and freeing of the clone in the same central function by Richard Cochran Signed-off-by: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: dsa: forward hardware timestamping ioctls to switch driverBrandon Streiff2018-02-141-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to the dsa slave network device so that switch drivers can implement the SIOC[GS]HWTSTAMP ioctls and the ethtool timestamp-info interface. Signed-off-by: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: try to keep packet if SYN_RCV race is lostEric Dumazet2018-02-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 배석진 reported that in some situations, packets for a given 5-tuple end up being processed by different CPUS. This involves RPS, and fragmentation. 배석진 is seeing packet drops when a SYN_RECV request socket is moved into ESTABLISH state. Other states are protected by socket lock. This is caused by a CPU losing the race, and simply not caring enough. Since this seems to occur frequently, we can do better and perform a second lookup. Note that all needed memory barriers are already in the existing code, thanks to the spin_lock()/spin_unlock() pair in inet_ehash_insert() and reqsk_put(). The second lookup must find the new socket, unless it has already been accepted and closed by another cpu. Note that the fragmentation could be avoided in the first place by use of a correct TCP MSS option in the SYN{ACK} packet, but this does not mean we can not be more robust. Many thanks to 배석진 for a very detailed analysis. Reported-by: 배석진 <soukjin.bae@samsung.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Make ax25_ptr depend on CONFIG_AX25David Ahern2018-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Allow pernet_operations to be executed in parallelKirill Tkhai2018-02-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds new pernet_operations::async flag to indicate operations, which ->init(), ->exit() and ->exit_batch() methods are allowed to be executed in parallel with the methods of any other pernet_operations. When there are only asynchronous pernet_operations in the system, net_mutex won't be taken for a net construction and destruction. Also, remove BUG_ON(mutex_is_locked()) from net_assign_generic() without replacing with the equivalent net_sem check, as there is one more lockdep assert below. v3: Add comment near net_mutex. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: make getname() functions return length rather than use int* parameterDenys Vlasenko2018-02-123-3/+3
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes since v1: Added changes in these files: drivers/infiniband/hw/usnic/usnic_transport.c drivers/staging/lustre/lnet/lnet/lib-socket.c drivers/target/iscsi/iscsi_target_login.c drivers/vhost/net.c fs/dlm/lowcomms.c fs/ocfs2/cluster/tcp.c security/tomoyo/network.c Before: All these functions either return a negative error indicator, or store length of sockaddr into "int *socklen" parameter and return zero on success. "int *socklen" parameter is awkward. For example, if caller does not care, it still needs to provide on-stack storage for the value it does not need. None of the many FOO_getname() functions of various protocols ever used old value of *socklen. They always just overwrite it. This change drops this parameter, and makes all these functions, on success, return length of sockaddr. It's always >= 0 and can be differentiated from an error. Tests in callers are changed from "if (err)" to "if (err < 0)", where needed. rpc_sockname() lost "int buflen" parameter, since its only use was to be passed to kernel_getsockname() as &buflen and subsequently not used in any way. Userspace API is not changed. text data bss dec hex filename 30108430 2633624 873672 33615726 200ef6e vmlinux.before.o 30108109 2633612 873672 33615393 200ee21 vmlinux.o Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> CC: David S. Miller <davem@davemloft.net> CC: linux-kernel@vger.kernel.org CC: netdev@vger.kernel.org CC: linux-bluetooth@vger.kernel.org CC: linux-decnet-user@lists.sourceforge.net CC: linux-wireless@vger.kernel.org CC: linux-rdma@vger.kernel.org CC: linux-sctp@vger.kernel.org CC: linux-nfs@vger.kernel.org CC: linux-x25@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* | | vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds2018-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller2018-02-091-0/+8
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Daniel Borkmann says: ==================== pull-request: bpf 2018-02-09 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Two fixes for BPF sockmap in order to break up circular map references from programs attached to sockmap, and detaching related sockets in case of socket close() event. For the latter we get rid of the smap_state_change() and plug into ULP infrastructure, which will later also be used for additional features anyway such as TX hooks. For the second issue, dependency chain is broken up via map release callback to free parse/verdict programs, all from John. 2) Fix a libbpf relocation issue that was found while implementing XDP support for Suricata project. Issue was that when clang was invoked with default target instead of bpf target, then various other e.g. debugging relevant sections are added to the ELF file that contained relocation entries pointing to non-BPF related sections which libbpf trips over instead of skipping them. Test cases for libbpf are added as well, from Jesper. 3) Various misc fixes for bpftool and one for libbpf: a small addition to libbpf to make sure it recognizes all standard section prefixes. Then, the Makefile in bpftool/Documentation is improved to explicitly check for rst2man being installed on the system as we otherwise risk installing empty man pages; the man page for bpftool-map is corrected and a set of missing bash completions added in order to avoid shipping bpftool where the completions are only partially working, from Quentin. 4) Fix applying the relocation to immediate load instructions in the nfp JIT which were missing a shift, from Jakub. 5) Two fixes for the BPF kernel selftests: handle CONFIG_BPF_JIT_ALWAYS_ON=y gracefully in test_bpf.ko module and mark them as FLAG_EXPECTED_FAIL in this case; and explicitly delete the veth devices in the two tests test_xdp_{meta,redirect}.sh before dismantling the netnses as when selftests are run in batch mode, then workqueue to handle destruction might not have finished yet and thus veth creation in next test under same dev name would fail, from Yonghong. 6) Fix test_kmod.sh to check the test_bpf.ko module path before performing an insmod, and fallback to modprobe. Especially the latter is useful when having a device under test that has the modules installed instead, from Naresh. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bpf: sockmap, add sock close() hook to remove socksJohn Fastabend2018-02-061-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The selftests test_maps program was leaving dangling BPF sockmap programs around because not all psock elements were removed from the map. The elements in turn hold a reference on the BPF program they are attached to causing BPF programs to stay open even after test_maps has completed. The original intent was that sk_state_change() would be called when TCP socks went through TCP_CLOSE state. However, because socks may be in SOCK_DEAD state or the sock may be a listening socket the event is not always triggered. To resolve this use the ULP infrastructure and register our own proto close() handler. This fixes the above case. Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support") Reported-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
| * | | net: add a UID to use for ULP socket assignmentJohn Fastabend2018-02-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create a UID field and enum that can be used to assign ULPs to sockets. This saves a set of string comparisons if the ULP id is known. For sockmap, which is added in the next patches, a ULP is used to hook into TCP sockets close state. In this case the ULP being added is done at map insert time and the ULP is known and done on the kernel side. In this case the named lookup is not needed. Because we don't want to expose psock internals to user space socket options a user visible flag is also added. For TLS this is set for BPF it will be cleared. Alos remove pr_notice, user gets an error code back and should check that rather than rely on logs. Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2018-02-072-6/+5Star
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for you net tree, they are: 1) Restore __GFP_NORETRY in xt_table allocations to mitigate effects of large memory allocation requests, from Michal Hocko. 2) Release IPv6 fragment queue in case of error in fragmentation header, this is a follow up to amend patch 83f1999caeb1, from Subash Abhinov Kasiviswanathan. 3) Flowtable infrastructure depends on NETFILTER_INGRESS as it registers a hook for each flowtable, reported by John Crispin. 4) Missing initialization of info->priv in xt_cgroup version 1, from Cong Wang. 5) Give a chance to garbage collector to run after scheduling flowtable cleanup. 6) Releasing flowtable content on nft_flow_offload module removal is not required at all, there is not dependencies between this module and flowtables, remove it. 7) Fix missing xt_rateest_mutex grabbing for hash insertions, also from Cong Wang. 8) Move nf_flow_table_cleanup() routine to flowtable core, this patch is a dependency for the next patch in this list. 9) Flowtable resources are not properly released on removal from the control plane. Fix this resource leak by scheduling removal of all entries and explicit call to the garbage collector. 10) nf_ct_nat_offset() declaration is dead code, this function prototype is not used anywhere, remove it. From Taehee Yoo. 11) Fix another flowtable resource leak on entry insertion failures, this patch also fixes a possible use-after-free. Patch from Felix Fietkau. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>