summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* net: phy: Add phy loopback support in net phy frameworkLin Yun Sheng2017-07-033-0/+57
| | | | | | | | | | | | This patch add set_loopback in phy_driver, which is used by MAC driver to enable or disable phy loopback. it also add a generic genphy_loopback function, which use BMCR loopback bit to enable or disable loopback. Signed-off-by: Lin Yun Sheng <linyunsheng@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/mlx5: fix memcpy limit?Stephen Rothwell2017-07-031-1/+1
| | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca2017-07-031-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: cdc_ncm: Reduce memory use when kernel memory lowJim Baxter2017-07-032-12/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CDC-NCM driver can require large amounts of memory to create skb's and this can be a problem when the memory becomes fragmented. This especially affects embedded systems that have constrained resources but wish to maximise the throughput of CDC-NCM with 16KiB NTB's. The issue is after running for a while the kernel memory can become fragmented and it needs compacting. If the NTB allocation is needed before the memory has been compacted the atomic allocation can fail which can cause increased latency, large re-transmissions or disconnections depending upon the data being transmitted at the time. This situation occurs for less than a second until the kernel has compacted the memory but the failed devices can take a lot longer to recover from the failed TX packets. To ease this temporary situation I modified the CDC-NCM TX path to temporarily switch into a reduced memory mode which allocates an NTB that will fit into a USB_CDC_NCM_NTB_MIN_OUT_SIZE (default 2048 Bytes) sized memory block and only transmit NTB's with a single network frame until the memory situation is resolved. Each time this issue occurs we wait for an increasing number of reduced size allocations before requesting a full size one to not put additional pressure on a low memory system. Once the memory is compacted the CDC-NCM data can resume transmitting at the normal tx_max rate once again. Signed-off-by: Jim Baxter <jim_baxter@mentor.com> Reviewed-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'qed-Add-iWARP-support-for-QL4xxxx'David S. Miller2017-07-0318-100/+3008
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michal Kalderon says: ==================== qed: Add iWARP support for QL4xxxx This patch series adds iWARP support to our QL4xxxx networking adapters. The code changes span across qed and qedr drivers, but this series contains changes to qed only. Once the series is accepted, the qedr series will be submitted to the rdma tree. There is one additional qed patch which enables the iWARP, this patch is delayed until the qedr series will be accepted. The patches were previously sent as an RFC, and these are the first 12 patches in the RFC series: https://www.spinics.net/lists/linux-rdma/msg51416.html This series was tested and built against net-next. MAINTAINERS file is not updated in this PATCH as there is a pending patch for qedr driver update https://patchwork.kernel.org/patch/9752761. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Add iWARP support for physical queue allocationKalderon, Michal2017-07-031-0/+4
| | | | | | | | | | | | | | | | | | iWARP has different physical queue requirements than RoCE Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Add iWARP protocol support in context allocationKalderon, Michal2017-07-031-2/+11
| | | | | | | | | | | | | | | | | | | | When computing how much memory is required for the different hw clients iWARP protocol should be taken into account Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP CM add error handlingKalderon, Michal2017-07-032-2/+190
| | | | | | | | | | | | | | | | | | | | This patch introduces error handling for errors that occurred during connection establishment. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP implement disconnect flowsKalderon, Michal2017-07-032-1/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch takes care of active/passive disconnect flows. Disconnect flows can be initiated remotely, in which case a async event will arrive from peer and indicated to qedr driver. These are referred to as exceptions. When a QP is destroyed, it needs to check that it's associated ep has been closed. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP CM add active side connectKalderon, Michal2017-07-034-12/+265
| | | | | | | | | | | | | | | | | | | | | | | | This patch implements the active side connect. Offload a connection, process MPA reply and send RTR. In some of the common passive/active functions, the active side will work in blocking mode. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP CM add passive side connectKalderon, Michal2017-07-039-20/+1039
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the passive side connect. It addresses pre-allocating resources, creating a connection element upon valid SYN packet received. Calling upper layer and implementation of the accept/reject calls. Error handling is not part of this patch. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP CM add listener functions and initial SYN processingKalderon, Michal2017-07-034-3/+343
| | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to add and remove listeners and identify whether the SYN packet received is intended for iWARP or not. If a listener is not found the SYN packet is posted back to the chip. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: iWARP CM - setup a ll2 connection for handling SYN packetsKalderon, Michal2017-07-032-3/+220
| | | | | | | | | | | | | | | | | | | | | | iWARP handles incoming SYN packets using the ll2 interface. This patch implements ll2 setup and teardown. Additional ll2 connections will be used in the future which are not part of this patch series. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Add iWARP support in ll2 connectionsKalderon, Michal2017-07-032-2/+12
| | | | | | | | | | | | | | | | | | | | Add a new connection type for iWARP ll2 connections for setting correct ll2 filters and connection type to FW. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Rename some ll2 related definesKalderon, Michal2017-07-033-17/+16Star
| | | | | | | | | | | | | | | | | | Make some names more generic as they will be used by iWARP too. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Implement iWARP initialization, teardown and qp operationsKalderon, Michal2017-07-0311-40/+803
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds iWARP support for flows that have common code between RoCE and iWARP, such as initialization, teardown and qp setup verbs: create, destroy, modify, query. It introduces the iWARP specific files qed_iwarp.[ch] and iwarp_common.h Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * qed: Introduce iWARP personalityKalderon, Michal2017-07-037-27/+43
|/ | | | | | | | | | | iWARP personality introduced the need for differentiating in several places in the code whether we are RoCE, iWARP or either. This leads to introducing new macros for querying the personality. Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bpf: fix to bpf_setsockopsLawrence Brakmo2017-07-031-2/+1Star
| | | | | | | | Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1 statement up). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'bpf-Add-support-for-sock_ops'David S. Miller2017-07-0225-26/+1218
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lawrence Brakmo says: ==================== bpf: Add support for sock_ops Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.) and setting connection parameters such as buffer sizes, initial window, SYN/SYN-ACK RTOs, etc. Unlike current BPF program types that expect to be called at a particular place in the network stack code, SOCK_OPS program can be called at different places and use an "op" field to indicate the context. There are currently two types of operations, those whose effect is through their return value and those whose effect is through the new bpf_setsocketop BPF helper function. Example operands of the first type are: BPF_SOCK_OPS_TIMEOUT_INIT BPF_SOCK_OPS_RWND_INIT BPF_SOCK_OPS_NEEDS_ECN Example operands of the secont type are: BPF_SOCK_OPS_TCP_CONNECT_CB BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB Current operands are only called during connection establishment so there should not be any BPF overheads after connection establishment. The main idea is to use connection information form both hosts, such as IP addresses and ports to allow setting of per connection parameters to optimize the connection's peformance. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some disticnt advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it does not require application changes and it can be updated easily at any time. It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. This patch set also includes sample BPF programs to demostrate the differnet features. v2: Formatting changes, rebased to latest net-next v3: Fixed build issues, changed socket_ops to sock_ops throught, fixed formatting issues, removed the syscall to load sock_ops program and added functionality to use existing bpf attach and bpf detach system calls, removed reader/writer locks in sock_bpfops.c (used when saving sock_ops global program) and fixed missing module refcount increment. v4: Removed global sock_ops program and instead used existing cgroup bpf infrastructure to support a new BPF_CGROUP_ATTCH type. v5: fixed kbuild warning happening in bpf-cgroup.h removed automatic converstion to host byte order from some sock_ops fields (ipv4 and ipv6 addresses, remote port) Added conversion to host byte order in some of the sample programs Added to sample BPF program comments about using load_sock_ops to load Removed is_req_sock field from bpf_sock_ops_kern and related places, using sk_fullsock() instead. v6: fixes to BPF helper function setsockopt (possible NULL deferencing, etc.) ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: update tools/include/uapi/linux/bpf.hLawrence Brakmo2017-07-021-1/+65
| | | | | | | | | | | | | | | | Update tools/include/uapi/linux/bpf.h to include changes related to new bpf sock_ops program type. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample bpf program to set sndcwnd clampLawrence Brakmo2017-07-022-0/+103
| | | | | | | | | | | | | | | | | | | | | | | | Sample BPF program, tcp_clamp_kern.c, to demostrate the use of setting the sndcwnd clamp. This program assumes that if the first 5.5 bytes of the host's IPv6 addresses are the same, then the hosts are in the same datacenter and sets sndcwnd clamp to 100 packets, SYN and SYN-ACK RTOs to 10ms and send/receive buffer sizes to 150KB. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Adds support for setting sndcwnd clampLawrence Brakmo2017-07-022-0/+8
| | | | | | | | | | | | | | | | | | Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which sets the initial congestion window. It is useful to limit the sndcwnd when the host are close to each other (small RTT). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample BPF program to set initial cwndLawrence Brakmo2017-07-022-0/+89
| | | | | | | | | | | | | | | | | | | | | | | | Sample BPF program that assumes hosts are far away (i.e. large RTTs) and sets initial cwnd and initial receive window to 40 packets, send and receive buffers to 1.5MB. In practice there would be a test to insure the hosts are actually far enough away. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Adds support for setting initial cwndLawrence Brakmo2017-07-022-1/+19
| | | | | | | | | | | | | | | | | | Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the initial congestion window. This can be used when the hosts are far apart (large RTTs) and it is safe to start with a large inital cwnd. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample BPF program to set congestion controlLawrence Brakmo2017-07-022-0/+84
| | | | | | | | | | | | | | | | | | Sample BPF program that sets congestion control to dctcp when both hosts are within the same datacenter. In this example that is assumed to be when they have the first 5.5 bytes of their IPv6 address are the same. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Add support for changing congestion controlLawrence Brakmo2017-07-027-17/+58
| | | | | | | | | | | | | | | | | | | | | | Added support for changing congestion control for SOCK_OPS bpf programs through the setsockopt bpf helper function. It also adds a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for congestion controls, like dctcp, that need to enable ECN in the SYN packets. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample BPF program to set buffer sizesLawrence Brakmo2017-07-022-0/+87
| | | | | | | | | | | | | | | | | | | | This patch contains a BPF program to set initial receive window to 40 packets and send and receive buffers to 1.5MB. This would usually be done after doing appropriate checks that indicate the hosts are far enough away (i.e. large RTT). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Add TCP connection BPF callbacksLawrence Brakmo2017-07-024-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | Added callbacks to BPF SOCK_OPS type program before an active connection is intialized and after a passive or active connection is established. The following patch demostrates how they can be used to set send and receive buffer sizes. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Add setsockopt helper function to bpfLawrence Brakmo2017-07-023-2/+94
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added support for calling a subset of socket setsockopts from BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather than making the changes to call the socket setsockopt function because the changes required would have been larger. The ops supported are: SO_RCVBUF SO_SNDBUF SO_MAX_PACING_RATE SO_PRIORITY SO_RCVLOWAT SO_MARK Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample bpf program to set initial windowLawrence Brakmo2017-07-022-0/+70
| | | | | | | | | | | | | | | | | | | | | | The sample bpf program, tcp_rwnd_kern.c, sets the initial advertized window to 40 packets in an environment where distinct IPv6 prefixes indicate that both hosts are not in the same data center. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Support for setting initial receive windowLawrence Brakmo2017-07-024-2/+28
| | | | | | | | | | | | | | | | | | This patch adds suppport for setting the initial advertized window from within a BPF_SOCK_OPS program. This can be used to support larger initial cwnd values in environments where it is known to be safe. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Sample bpf program to set SYN/SYN-ACK RTOsLawrence Brakmo2017-07-022-0/+70
| | | | | | | | | | | | | | | | | | | | | | The sample BPF program, tcp_synrto_kern.c, sets the SYN and SYN-ACK RTOs to 10ms when both hosts are within the same datacenter (i.e. small RTTs) in an environment where common IPv6 prefixes indicate both hosts are in the same data center. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: Support for per connection SYN/SYN-ACK RTOsLawrence Brakmo2017-07-024-2/+17
| | | | | | | | | | | | | | | | | | | | This patch adds support for setting a per connection SYN and SYN_ACK RTOs from within a BPF_SOCK_OPS program. For example, to set small RTOs when it is known both hosts are within a datacenter. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: program to load and attach sock_ops BPF progsLawrence Brakmo2017-07-022-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The program load_sock_ops can be used to load sock_ops bpf programs and to attach it to an existing (v2) cgroup. It can also be used to detach sock_ops programs. Examples: load_sock_ops [-l] <cg-path> <prog filename> Load and attaches a sock_ops program at the specified cgroup. If "-l" is used, the program will continue to run to output the BPF log buffer. If the specified filename does not end in ".o", it appends "_kern.o" to the name. load_sock_ops -r <cg-path> Detaches the currently attached sock_ops program from the specified cgroup. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bpf: BPF support for sock_opsLawrence Brakmo2017-07-029-3/+314
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.). It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. The program will be called at appropriate times to set relevant connections parameters such as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection information such as IP addresses, port numbers, etc. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some distinct advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it oes not require application changes and it can be updated easily at any time. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. The purpose of this new program type is to simplify setting connection parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is easy to use facebook's internal IPv6 addresses to determine if both hosts of a connection are in the same datacenter. Therefore, it is easy to write a BPF program to choose a small SYN RTO value when both hosts are in the same datacenter. This patch only contains the framework to support the new BPF program type, following patches add the functionality to set various connection parameters. This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS and a new bpf syscall command to load a new program of this type: BPF_PROG_LOAD_SOCKET_OPS. Two new corresponding structs (one for the kernel one for the user/BPF program): /* kernel version */ struct bpf_sock_ops_kern { struct sock *sk; __u32 op; union { __u32 reply; __u32 replylong[4]; }; }; /* user version * Some fields are in network byte order reflecting the sock struct * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to * convert them to host byte order. */ struct bpf_sock_ops { __u32 op; union { __u32 reply; __u32 replylong[4]; }; __u32 family; __u32 remote_ip4; /* In network byte order */ __u32 local_ip4; /* In network byte order */ __u32 remote_ip6[4]; /* In network byte order */ __u32 local_ip6[4]; /* In network byte order */ __u32 remote_port; /* In network byte order */ __u32 local_port; /* In host byte horder */ }; Currently there are two types of ops. The first type expects the BPF program to return a value which is then used by the caller (or a negative value to indicate the operation is not supported). The second type expects state changes to be done by the BPF program, for example through a setsockopt BPF helper function, and they ignore the return value. The reply fields of the bpf_sockt_ops struct are there in case a bpf program needs to return a value larger than an integer. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'for-upstream' of ↵David S. Miller2017-07-0210-39/+60
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-07-01 Here are some more Bluetooth patches for the 4.13 kernel: - Added support for Broadcom BCM43430 controllers - Added sockaddr length checks before accessing sa_family - Fixed possible "might sleep" errors in bnep, cmtp and hidp modules - A few other minor fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * Bluetooth: btbcm: Add entry for BCM43430 UART bluetoothIan Molton2017-06-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | This patch adds the device ID for the bluetooth chip used in the Broadcom BCM43430 SDIO WiFi / UART BT chip. Successfully tested using Firmware version 0x0182 Signed-off-by: Ian Molton <ian@mnementh.co.uk> Reported-by: Loic Poulain <loic.poulain@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: Add sockaddr length checks before accessing sa_family in bind and ↵Mateusz Jurczyk2017-06-293-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | connect handlers Verify that the caller-provided sockaddr structure is large enough to contain the sa_family field, before accessing it in bind() and connect() handlers of the Bluetooth sockets. Since neither syscall enforces a minimum size of the corresponding memory region, very short sockaddrs (zero or one byte long) result in operating on uninitialized memory while referencing sa_family. Signed-off-by: Mateusz Jurczyk <mjurczyk@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * bluetooth: remove WQ_MEM_RECLAIM from hci workqueuesTejun Heo2017-06-291-4/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bluetooth hci uses ordered HIGHPRI, MEM_RECLAIM workqueues. It's likely that the flags came from mechanical conversion from create_singlethread_workqueue(). Bluetooth shouldn't be depended upon for memory reclaim and the spurious MEM_RECLAIM flag can trigger the following warning. Remove WQ_MEM_RECLAIM and convert to alloc_ordered_workqueue() while at it. workqueue: WQ_MEM_RECLAIM hci0:hci_power_off is flushing !WQ_MEM_RECLAIM events:btusb_work ------------[ cut here ]------------ WARNING: CPU: 2 PID: 14231 at /home/brodo/local/kernel/git/linux/kernel/workqueue.c:2423 check_flush_dependency+0xb3/0x100 Modules linked in: CPU: 2 PID: 14231 Comm: kworker/u9:4 Not tainted 4.12.0-rc6+ #3 Hardware name: Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 Workqueue: hci0 hci_power_off task: ffff9432dad58000 task.stack: ffff986d43790000 RIP: 0010:check_flush_dependency+0xb3/0x100 RSP: 0018:ffff986d43793c90 EFLAGS: 00010086 RAX: 000000000000005a RBX: ffff943316810820 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000096 RDI: 0000000000000001 RBP: ffff986d43793cb0 R08: 0000000000000775 R09: ffffffff85bdd5c0 R10: 0000000000000040 R11: 0000000000000000 R12: ffffffff84d596e0 R13: ffff9432dad58000 R14: ffff94321c640320 R15: ffff9432dad58000 FS: 0000000000000000(0000) GS:ffff94331f500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007b8bca242000 CR3: 000000014f60a000 CR4: 00000000003406e0 Call Trace: flush_work+0x8a/0x1c0 ? flush_work+0x184/0x1c0 ? skb_free_head+0x21/0x30 __cancel_work_timer+0x124/0x1b0 ? hci_dev_do_close+0x2a4/0x4d0 cancel_work_sync+0x10/0x20 btusb_close+0x23/0x100 hci_dev_do_close+0x2ca/0x4d0 hci_power_off+0x1e/0x50 process_one_work+0x184/0x3e0 worker_thread+0x4a/0x3a0 ? preempt_count_sub+0x9b/0x100 ? preempt_count_sub+0x9b/0x100 kthread+0x125/0x140 ? process_one_work+0x3e0/0x3e0 ? __kthread_create_on_node+0x1a0/0x1a0 ? do_syscall_64+0x58/0xd0 ret_from_fork+0x27/0x40 Code: 00 75 bf 49 8b 56 18 48 8d 8b b0 00 00 00 48 81 c6 b0 00 00 00 4d 89 e0 48 c7 c7 20 23 6b 85 c6 05 83 cd 31 01 01 e8 bf c4 0c 00 <0f> ff eb 93 80 3d 74 cd 31 01 00 75 a5 65 48 8b 04 25 00 c5 00 ---[ end trace b88fd2f77754bfec ]--- Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: hci_bcm: Add active_low irq polarity quirk for Asus T100CHIHans de Goede2017-06-291-0/+9
| | | | | | | | | | | | | | | | | | | | Just like the T100TA the host-wake irq on the Asus T100CHI is active low. Having a quirk for this is actually extra important on the T100CHI as it ships with a bluetooth keyboard dock, which does not work properly without this quirk. Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: hidp: fix possible might sleep error in hidp_session_threadJeffy Chen2017-06-271-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It looks like hidp_session_thread has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Tested-by: AL Yu-Chen Cho <acho@suse.com> Tested-by: Rohit Vaswani <rvaswani@nvidia.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: cmtp: fix possible might sleep error in cmtp_sessionJeffy Chen2017-06-271-7/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It looks like cmtp_session has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: AL Yu-Chen Cho <acho@suse.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: bnep: fix possible might sleep error in bnep_sessionJeffy Chen2017-06-271-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It looks like bnep_session has same pattern as the issue reported in old rfcomm: while (1) { set_current_state(TASK_INTERRUPTIBLE); if (condition) break; // may call might_sleep here schedule(); } __set_current_state(TASK_RUNNING); Which fixed at: dfb2fae Bluetooth: Fix nested sleeps So let's fix it at the same way, also follow the suggestion of: https://lwn.net/Articles/628628/ Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com> Reviewed-by: Brian Norris <briannorris@chromium.org> Reviewed-by: AL Yu-Chen Cho <acho@suse.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: hci_bcm: Fix unwanted error reporting if no bcm devLoic Poulain2017-06-271-2/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hci_bcm proto is able to operate without bcm platform device linked to its uart port. In that case, firmware can be applied, but there is no power operation (no gpio/irq resources mgmt). However, the current implementation breaks this use case because of reporting a ENODEV error in the bcm setup procedure if bcm_request_irq fails (which is the case if no bcm device linked). Fix this by removing bcm_request_irq error forwarding. Signed-off-by: Loic Poulain <loic.poulain@intel.com> Reported-by: Ian Molton <ian@mnementh.co.uk> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
| * Bluetooth: hci_serdev: make hci_serdev_client_ops staticColin Ian King2017-06-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | The structure hci_serdev_client_ops does not need to be in global scope and is not modified, so make it static. Cleans up sparse warning: "symbol 'hci_serdev_client_ops' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
* | sctp: Add peeloff-flags socket optionNeil Horman2017-07-022-15/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a request raised on the sctp devel list, there is a need to augment the sctp_peeloff operation while specifying the O_CLOEXEC and O_NONBLOCK flags (simmilar to the socket syscall). Since modifying the SCTP_SOCKOPT_PEELOFF socket option would break user space ABI for existing programs, this patch creates a new socket option SCTP_SOCKOPT_PEELOFF_FLAGS, which accepts a third flags parameter to allow atomic assignment of the socket descriptor flags. Tested successfully by myself and the requestor Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Andreas Steinmetz <ast@domdv.de> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'sfc-MCDI-cleanups'David S. Miller2017-07-021-3/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Edward Cree says: ==================== sfc: small MCDI cleanups Giving the full MCDI event rather than just the code can aid in debugging. While fixing this I noticed an outdated comment. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sfc: correct comment on efx_mcdi_process_eventEdward Cree2017-07-021-1/+1
| | | | | | | | | | | | | | | | | | | | | Fix out-of-date comment. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sfc: change Unknown MCDI event message to print full event.Jon Cooper2017-07-021-2/+3
|/ / | | | | | | | | Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx5: fix spelling mistake: "Allodating" -> "Allocating"Colin Ian King2017-07-011-1/+1
| | | | | | | | | | | | | | | | Trivial fix to spelling mistake in mlx5_core_dbg debug message Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>