summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* kbuild: disable clang's default use of -fmerge-all-constantsDaniel Borkmann2018-03-211-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prasad reported that he has seen crashes in BPF subsystem with netd on Android with arm64 in the form of (note, the taint is unrelated): [ 4134.721483] Unable to handle kernel paging request at virtual address 800000001 [ 4134.820925] Mem abort info: [ 4134.901283] Exception class = DABT (current EL), IL = 32 bits [ 4135.016736] SET = 0, FnV = 0 [ 4135.119820] EA = 0, S1PTW = 0 [ 4135.201431] Data abort info: [ 4135.301388] ISV = 0, ISS = 0x00000021 [ 4135.359599] CM = 0, WnR = 0 [ 4135.470873] user pgtable: 4k pages, 39-bit VAs, pgd = ffffffe39b946000 [ 4135.499757] [0000000800000001] *pgd=0000000000000000, *pud=0000000000000000 [ 4135.660725] Internal error: Oops: 96000021 [#1] PREEMPT SMP [ 4135.674610] Modules linked in: [ 4135.682883] CPU: 5 PID: 1260 Comm: netd Tainted: G S W 4.14.19+ #1 [ 4135.716188] task: ffffffe39f4aa380 task.stack: ffffff801d4e0000 [ 4135.731599] PC is at bpf_prog_add+0x20/0x68 [ 4135.741746] LR is at bpf_prog_inc+0x20/0x2c [ 4135.751788] pc : [<ffffff94ab7ad584>] lr : [<ffffff94ab7ad638>] pstate: 60400145 [ 4135.769062] sp : ffffff801d4e3ce0 [...] [ 4136.258315] Process netd (pid: 1260, stack limit = 0xffffff801d4e0000) [ 4136.273746] Call trace: [...] [ 4136.442494] 3ca0: ffffff94ab7ad584 0000000060400145 ffffffe3a01bf8f8 0000000000000006 [ 4136.460936] 3cc0: 0000008000000000 ffffff94ab844204 ffffff801d4e3cf0 ffffff94ab7ad584 [ 4136.479241] [<ffffff94ab7ad584>] bpf_prog_add+0x20/0x68 [ 4136.491767] [<ffffff94ab7ad638>] bpf_prog_inc+0x20/0x2c [ 4136.504536] [<ffffff94ab7b5d08>] bpf_obj_get_user+0x204/0x22c [ 4136.518746] [<ffffff94ab7ade68>] SyS_bpf+0x5a8/0x1a88 Android's netd was basically pinning the uid cookie BPF map in BPF fs (/sys/fs/bpf/traffic_cookie_uid_map) and later on retrieving it again resulting in above panic. Issue is that the map was wrongly identified as a prog! Above kernel was compiled with clang 4.0, and it turns out that clang decided to merge the bpf_prog_iops and bpf_map_iops into a single memory location, such that the two i_ops could then not be distinguished anymore. Reason for this miscompilation is that clang has the more aggressive -fmerge-all-constants enabled by default. In fact, clang source code has a comment about it in lib/AST/ExprConstant.cpp on why it is okay to do so: Pointers with different bases cannot represent the same object. (Note that clang defaults to -fmerge-all-constants, which can lead to inconsistent results for comparisons involving the address of a constant; this generally doesn't matter in practice.) The issue never appeared with gcc however, since gcc does not enable -fmerge-all-constants by default and even *explicitly* states in it's option description that using this flag results in non-conforming behavior, quote from man gcc: Languages like C or C++ require each variable, including multiple instances of the same variable in recursive calls, to have distinct locations, so using this option results in non-conforming behavior. There are also various clang bug reports open on that matter [1], where clang developers acknowledge the non-conforming behavior, and refer to disabling it with -fno-merge-all-constants. But even if this gets fixed in clang today, there are already users out there that triggered this. Thus, fix this issue by explicitly adding -fno-merge-all-constants to the kernel's Makefile to generically disable this optimization, since potentially other places in the kernel could subtly break as well. Note, there is also a flag called -fmerge-constants (not supported by clang), which is more conservative and only applies to strings and it's enabled in gcc's -O/-O2/-O3/-Os optimization levels. In gcc's code, the two flags -fmerge-{all-,}constants share the same variable internally, so when disabling it via -fno-merge-all-constants, then we really don't merge any const data (e.g. strings), and text size increases with gcc (14,927,214 -> 14,942,646 for vmlinux.o). $ gcc -fverbose-asm -O2 foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants foo.c -S -o foo.S -> foo.S doesn't list -fmerge-constants under options enabled $ gcc -fverbose-asm -O2 -fno-merge-all-constants -fmerge-constants foo.c -S -o foo.S -> foo.S lists -fmerge-constants under options enabled Thus, as a workaround we need to set both -fno-merge-all-constants *and* -fmerge-constants in the Makefile in order for text size to stay as is. [1] https://bugs.llvm.org/show_bug.cgi?id=18538 Reported-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chenbo Feng <fengc@google.com> Cc: Richard Smith <richard-llvm@metafoo.co.uk> Cc: Chandler Carruth <chandlerc@gmail.com> Cc: linux-kernel@vger.kernel.org Tested-by: Prasad Sodagudi <psodagud@codeaurora.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
* bpf: skip unnecessary capability checkChenbo Feng2018-03-201-1/+1
| | | | | | | | | | | | | | The current check statement in BPF syscall will do a capability check for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This code path will trigger unnecessary security hooks on capability checking and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN access. This can be resolved by simply switch the order of the statement and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is allowed. Signed-off-by: Chenbo Feng <fengc@google.com> Acked-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programsYonghong Song2018-03-201-28/+40
| | | | | | | | | | | | | | | | | | | Commit 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") added helper bpf_perf_prog_read_value so that perf_event type program can read event counter and enabled/running time. This commit, however, introduced a bug which allows this helper for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value needs to access perf_event through its bpf_perf_event_data_kern type context, which is not available for tracepoint type program. This patch fixed the issue by separating bpf_func_proto between tracepoint and perf_event type programs and removed bpf_perf_prog_read_value from tracepoint func prototype. Fixes: 4bebdc7a85aa ("bpf: add helper bpf_perf_prog_read_value") Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* test_bpf: Fix testing with CONFIG_BPF_JIT_ALWAYS_ON=y on other archesThadeu Lima de Souza Cascardo2018-03-201-1/+1
| | | | | | | | | | | | | | | | | | Function bpf_fill_maxinsns11 is designed to not be able to be JITed on x86_64. So, it fails when CONFIG_BPF_JIT_ALWAYS_ON=y, and commit 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y") makes sure that failure is detected on that case. However, it does not fail on other architectures, which have a different JIT compiler design. So, test_bpf has started to fail to load on those. After this fix, test_bpf loads fine on both x86_64 and ppc64el. Fixes: 09584b406742 ("bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* error-injection: Fix to prohibit jump optimizationMasami Hiramatsu2018-03-121-0/+10
| | | | | | | | | | | Since the kprobe which was optimized by jump can not change the execution path, the kprobe for error-injection must not be optimized. To prohibit it, set a dummy post-handler as officially stated in Documentation/kprobes.txt. Fixes: 4b1a29a7f542 ("error-injection: Support fault injection framework") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
* Merge branch 'bnxt_en-Bug-fixes'David S. Miller2018-03-123-96/+118
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael Chan says: ==================== bnxt_en: Bug fixes. There are 3 bug fixes in this series to fix regressions recently introduced when adding the new ring reservations scheme. 2 minor fixes in the TC Flower code to return standard errno values and to elide some unnecessary warning dmesg. One Fixes the VLAN TCI value passed to the stack by including the entire 16-bit VLAN TCI, and the last fix is to check for valid VNIC ID before setting up or shutting down LRO/GRO. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().Michael Chan2018-03-121-0/+3
| | | | | | | | | | | | | | | | | | | | During initialization, if we encounter errors, there is a code path that calls bnxt_hwrm_vnic_set_tpa() with invalid VNIC ID. This may cause a warning in firmware logs. Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: close & open NIC, only when the interface is in running state.Venkat Duvvuru2018-03-121-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bnxt_restore_pf_fw_resources routine frees PF resources by calling close_nic and allocates the resources back, by doing open_nic. However, this is not needed, if the PF is already in closed state. This bug causes the driver to call open the device and call request_irq() when it is not needed. Ultimately, pci_disable_msix() will crash when bnxt_en is unloaded. This patch fixes the problem by skipping __bnxt_close_nic and __bnxt_open_nic inside bnxt_restore_pf_fw_resources routine, if the interface is not running. Fixes: 80fcaf46c092 ("bnxt_en: Restore MSIX after disabling SRIOV.") Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Return standard Linux error codes for hwrm flow cmds.Venkat Duvvuru2018-03-121-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, internal error value is returned by the driver, when hwrm_cfa_flow_alloc() fails due lack of resources. We should be returning Linux errno value -ENOSPC instead. This patch also converts other similar command errors to standard Linux errno code (-EIO) in bnxt_tc.c Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Fix regressions when setting up MQPRIO TX rings.Michael Chan2018-03-121-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent changes added the bnxt_init_int_mode() call in the driver's open path whenever ring reservations are changed. This call was previously only called in the probe path. In the open path, if MQPRIO TC has been setup, the bnxt_init_int_mode() call would reset and mess up the MQPRIO per TC rings. Fix it by not re-initilizing bp->tx_nr_rings_per_tc in bnxt_init_int_mode(). Instead, initialize it in the probe path only after the bnxt_init_int_mode() call. Fixes: 674f50a5b026 ("bnxt_en: Implement new method to reserve rings.") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Pass complete VLAN TCI to the stack.Michael Chan2018-03-122-2/+3
| | | | | | | | | | | | | | | | | | When receiving a packet with VLAN tag, pass the entire 16-bit TCI to the stack when calling __vlan_hwaccel_put_tag(). The current code is only passing the 12-bit tag and it is missing the priority bits. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Remove unwanted ovs-offload messages in some conditionsSriharsha Basavapatna2018-03-121-8/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some conditions when the driver fails to add a flow in HW and returns an error back to the stack, the stack continues to invoke get_flow_stats() and/or del_flow() on it. The driver fails these APIs with an error message "no flow_node for cookie". The message gets logged repeatedly as long as the stack keeps invoking these functions. Fix this by removing the corresponding netdev_info() calls from these functions. Fixes: d7bc73053024 ("bnxt_en: add code to query TC flower offload stats") Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Fix vnic accounting in the bnxt_check_rings() path.Eddie Wai2018-03-121-44/+20Star
| | | | | | | | | | | | | | | | | | | | | | | | | | The number of vnics to check must be determined ahead of time because only standard RX rings require vnics to support RFS. The logic is similar to the ring reservation logic and we can now use the refactored common functions to do most of the work in setting up the firmware message. Fixes: 8f23d638b36b ("bnxt_en: Expand bnxt_check_rings() to check all resources.") Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * bnxt_en: Refactor the functions to reserve hardware rings.Michael Chan2018-03-121-32/+53
|/ | | | | | | | | The bnxt_hwrm_reserve_{pf|vf}_rings() functions are very similar to the bnxt_hwrm_check_{pf|vf}_rings() functions. Refactor the former so that the latter can make use of common code in the next patch. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: phy: Tell caller result of phy_change()Brad Mouring2018-03-122-74/+72Star
| | | | | | | | | | | | | | | | | In 664fcf123a30e (net: phy: Threaded interrupts allow some simplification) the phy_interrupt system was changed to use a traditional threaded interrupt scheme instead of a workqueue approach. With this change, the phy status check moved into phy_change, which did not report back to the caller whether or not the interrupt was handled. This means that, in the case of a shared phy interrupt, only the first phydev's interrupt registers are checked (since phy_interrupt() would always return IRQ_HANDLED). This leads to interrupt storms when it is a secondary device that's actually the interrupt source. Signed-off-by: Brad Mouring <brad.mouring@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* openvswitch: meter: fix the incorrect calculation of max delta_tzhangliping2018-03-121-3/+9
| | | | | | | | | | | Max delat_t should be the full_bucket/rate instead of the full_bucket. Also report EINVAL if the rate is zero. Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure") Cc: Andy Zhou <azhou@ovn.org> Signed-off-by: zhangliping <zhangliping02@baidu.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* macvlan: filter out unsupported feature flagsShannon Nelson2018-03-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adding a macvlan device on top of a lowerdev that supports the xfrm offloads fails with a new regression: # ip link add link ens1f0 mv0 type macvlan RTNETLINK answers: Operation not permitted Tracing down the failure shows that the macvlan device inherits the NETIF_F_HW_ESP and NETIF_F_HW_ESP_TX_CSUM feature flags from the lowerdev, but with no dev->xfrmdev_ops API filled in, it doesn't actually support xfrm. When the request is made to add the new macvlan device, the XFRM listener for NETDEV_REGISTER calls xfrm_api_check() which fails the new registration because dev->xfrmdev_ops is NULL. The macvlan creation succeeds when we filter out the ESP feature flags in macvlan_fix_features(), so let's filter them out like we're already filtering out ~NETIF_F_NETNS_LOCAL. When XFRM support is added in the future, we can add the flags into MACVLAN_FEATURES. This same problem could crop up in the future with any other new feature flags, so let's filter out any flags that aren't defined as supported in macvlan. Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API") Reported-by: Alexey Kodanev <alexey.kodanev@oracle.com> Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'erspan-fixes'David S. Miller2018-03-091-1/+7
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | William Tu says: ==================== a couple of erspan fixes The series fixes a couple of erspan issues. The first patch adds the erspan v2 proto type to the ip6 tunnel lookup. The second patch improves the error handling when users screws the version number in metadata. The final patch makes sure the skb has enough headroom for pushing erspan header when xmit. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip6erspan: make sure enough headroom at xmit.William Tu2018-03-091-0/+3
| | | | | | | | | | | | | | | | | | The patch adds skb_cow_header() to ensure enough headroom at ip6erspan_tunnel_xmit before pushing the erspan header to the skb. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip6erspan: improve error handling for erspan version number.William Tu2018-03-091-0/+2
| | | | | | | | | | | | | | | | | | | | When users fill in incorrect erspan version number through the struct erspan_metadata uapi, current code skips pushing the erspan header but continue pushing the gre header, which is incorrect. The patch fixes it by returning error. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip6gre: add erspan v2 to tunnel lookupWilliam Tu2018-03-091-1/+2
|/ | | | | | | | The patch adds the erspan v2 proto in ip6gre_tunnel_lookup so the erspan v2 tunnel can be found correctly. Signed-off-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'mlxsw-ACL-and-mirroring-fixes'David S. Miller2018-03-096-5/+46
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ido Schimmel says: ==================== mlxsw: ACL and mirroring fixes The first patch fixes offload of rules using the 'pass' action. Instead of continuing to evaluate lower priority rules, the binding is terminated and the packet proceeds to the bridge and router blocks on ingress, or goes out of the port on egress. Second patch prevents the user from mirroring more than once from a given {Port, Direction} as this is not supported by the device. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum: Prevent duplicate mirrorsPetr Machata2018-03-092-4/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Spectrum ASIC doesn't support mirroring more than once from a single binding point (which is a port-direction pair). Therefore detect that a second binding of a given binding point is attempted. To that end, extend struct mlxsw_sp_span_inspected_port to track whether a given binding point is bound or not. Extend mlxsw_sp_span_entry_port_find() to look for ports based on the full unique key: port number, direction, and boundness. Besides fixing the overt bug where configured mirrors are not offloaded, this also fixes a more subtle bug: mlxsw_sp_span_inspected_port_del() just defers to mlxsw_sp_span_entry_bound_port_find(), and that used to find the first port with the right number (disregarding the type). Thus by adding and removing egress and ingress mirrors in the right order, one could trick the system into believing it has no egress mirrors when in fact it did have some. That then caused that mlxsw_sp_span_port_mtu_update() didn't update mirroring buffer when MTU was changed. Fixes: 763b4b70afcd ("mlxsw: spectrum: Add support in matchall mirror TC offloading") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * mlxsw: spectrum: Fix gact_ok offloadingJiri Pirko2018-03-095-1/+19
|/ | | | | | | | | | | For ok GACT action, TERMINATE binding_cmd should be used in action set passed down to HW. Fixes: b2925957ec1a9 ("mlxsw: spectrum_flower: Offload "ok" termination action") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reported-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'vhost_net-ptr_ring-fixes'David S. Miller2018-03-093-4/+11
|\ | | | | | | | | | | | | | | | | | | | | | | | | Jason Wang says: ==================== Several fixes for vhost_net ptr_ring usage This small series try to fix several bugs of ptr_ring usage in vhost_net. Please review. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * vhost_net: examine pointer types during un-producingJason Wang2018-03-093-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | After commit fc72d1d54dd9 ("tuntap: XDP transmission"), we can actually queueing XDP pointers in the pointer ring, so we should examine the pointer type before freeing the pointer. Fixes: fc72d1d54dd9 ("tuntap: XDP transmission") Reported-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vhost_net: keep private_data and rx_ring syncedJason Wang2018-03-091-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | We get pointer ring from the exported sock, this means we should keep rx_ring and vq->private synced during both vq stop and backend set, otherwise we may see stale rx_ring. Fixes: c67df11f6e480 ("vhost_net: try batch dequing from skb array") Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * vhost_net: initialize rx_ring in vhost_net_open()Alexander Potapenko2018-03-091-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KMSAN reported a use of uninit memory in vhost_net_buf_unproduce() while trying to access n->vqs[VHOST_NET_VQ_TX].rx_ring: ================================================================== BUG: KMSAN: use of uninitialized memory in vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vho et.c:170 CPU: 0 PID: 3021 Comm: syz-fuzzer Not tainted 4.16.0-rc4+ #3853 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x185/0x1d0 lib/dump_stack.c:53 kmsan_report+0x142/0x1f0 mm/kmsan/kmsan.c:1093 __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676 vhost_net_buf_unproduce+0x7bb/0x9a0 drivers/vhost/net.c:170 vhost_net_stop_vq drivers/vhost/net.c:974 [inline] vhost_net_stop+0x146/0x380 drivers/vhost/net.c:982 vhost_net_release+0xb1/0x4f0 drivers/vhost/net.c:1015 __fput+0x49f/0xa00 fs/file_table.c:209 ____fput+0x37/0x40 fs/file_table.c:243 task_work_run+0x243/0x2c0 kernel/task_work.c:113 tracehook_notify_resume include/linux/tracehook.h:191 [inline] exit_to_usermode_loop arch/x86/entry/common.c:166 [inline] prepare_exit_to_usermode+0x349/0x3b0 arch/x86/entry/common.c:196 syscall_return_slowpath+0xf3/0x6d0 arch/x86/entry/common.c:265 do_syscall_64+0x34d/0x450 arch/x86/entry/common.c:292 ... origin: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:303 [inline] kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:213 kmsan_kmalloc_large+0x6f/0xd0 mm/kmsan/kmsan.c:392 kmalloc_large_node_hook mm/slub.c:1366 [inline] kmalloc_large_node mm/slub.c:3808 [inline] __kmalloc_node+0x100e/0x1290 mm/slub.c:3818 kmalloc_node include/linux/slab.h:554 [inline] kvmalloc_node+0x1a5/0x2e0 mm/util.c:419 kvmalloc include/linux/mm.h:541 [inline] vhost_net_open+0x64/0x5f0 drivers/vhost/net.c:921 misc_open+0x7b5/0x8b0 drivers/char/misc.c:154 chrdev_open+0xc28/0xd90 fs/char_dev.c:417 do_dentry_open+0xccb/0x1430 fs/open.c:752 vfs_open+0x272/0x2e0 fs/open.c:866 do_last fs/namei.c:3378 [inline] path_openat+0x49ad/0x6580 fs/namei.c:3519 do_filp_open+0x267/0x640 fs/namei.c:3553 do_sys_open+0x6ad/0x9c0 fs/open.c:1059 SYSC_openat+0xc7/0xe0 fs/open.c:1086 SyS_openat+0x63/0x90 fs/open.c:1080 do_syscall_64+0x2f1/0x450 arch/x86/entry/common.c:287 ================================================================== Fixes: c67df11f6e480 ("vhost_net: try batch dequing from skb array") Signed-off-by: Alexander Potapenko <glider@google.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ethernet: ave: enable Rx drop interruptKunihiko Hayashi2018-03-091-1/+1
| | | | | | | | | This enables AVE_GI_RXDROP interrupt factor. This factor indicates depletion of Rx descriptors and the handler counts the number of dropped packets. Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use skb_is_gso_sctp() instead of open-codingDaniel Axtens2018-03-096-9/+12
| | | | | | | | | | | | | As well as the basic conversion, I noticed that a lot of the SCTP code checks gso_type without first checking skb_is_gso() so I have added that where appropriate. Also, document the helper. Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* ieee802154: 6lowpan: fix possible NULL deref in lowpan_device_event()Eric Dumazet2018-03-091-4/+8
| | | | | | | | | | | | | | | | A tun device type can trivially be set to arbitrary value using TUNSETLINK ioctl(). Therefore, lowpan_device_event() must really check that ieee802154_ptr is not NULL. Fixes: 2c88b5283f60d ("ieee802154: 6lowpan: remove check on null") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Aring <alex.aring@gmail.com> Cc: Stefan Schmidt <stefan@osg.samsung.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Stefan Schmidt <stefan@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix access to non-linear packet in ndisc_fill_redirect_hdr_option()Lorenzo Bianconi2018-03-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix the following slab-out-of-bounds kasan report in ndisc_fill_redirect_hdr_option when the incoming ipv6 packet is not linear and the accessed data are not in the linear data region of orig_skb. [ 1503.122508] ================================================================== [ 1503.122832] BUG: KASAN: slab-out-of-bounds in ndisc_send_redirect+0x94e/0x990 [ 1503.123036] Read of size 1184 at addr ffff8800298ab6b0 by task netperf/1932 [ 1503.123220] CPU: 0 PID: 1932 Comm: netperf Not tainted 4.16.0-rc2+ #124 [ 1503.123347] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014 [ 1503.123527] Call Trace: [ 1503.123579] <IRQ> [ 1503.123638] print_address_description+0x6e/0x280 [ 1503.123849] kasan_report+0x233/0x350 [ 1503.123946] memcpy+0x1f/0x50 [ 1503.124037] ndisc_send_redirect+0x94e/0x990 [ 1503.125150] ip6_forward+0x1242/0x13b0 [...] [ 1503.153890] Allocated by task 1932: [ 1503.153982] kasan_kmalloc+0x9f/0xd0 [ 1503.154074] __kmalloc_track_caller+0xb5/0x160 [ 1503.154198] __kmalloc_reserve.isra.41+0x24/0x70 [ 1503.154324] __alloc_skb+0x130/0x3e0 [ 1503.154415] sctp_packet_transmit+0x21a/0x1810 [ 1503.154533] sctp_outq_flush+0xc14/0x1db0 [ 1503.154624] sctp_do_sm+0x34e/0x2740 [ 1503.154715] sctp_primitive_SEND+0x57/0x70 [ 1503.154807] sctp_sendmsg+0xaa6/0x1b10 [ 1503.154897] sock_sendmsg+0x68/0x80 [ 1503.154987] ___sys_sendmsg+0x431/0x4b0 [ 1503.155078] __sys_sendmsg+0xa4/0x130 [ 1503.155168] do_syscall_64+0x171/0x3f0 [ 1503.155259] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.155436] Freed by task 1932: [ 1503.155527] __kasan_slab_free+0x134/0x180 [ 1503.155618] kfree+0xbc/0x180 [ 1503.155709] skb_release_data+0x27f/0x2c0 [ 1503.155800] consume_skb+0x94/0xe0 [ 1503.155889] sctp_chunk_put+0x1aa/0x1f0 [ 1503.155979] sctp_inq_pop+0x2f8/0x6e0 [ 1503.156070] sctp_assoc_bh_rcv+0x6a/0x230 [ 1503.156164] sctp_inq_push+0x117/0x150 [ 1503.156255] sctp_backlog_rcv+0xdf/0x4a0 [ 1503.156346] __release_sock+0x142/0x250 [ 1503.156436] release_sock+0x80/0x180 [ 1503.156526] sctp_sendmsg+0xbb0/0x1b10 [ 1503.156617] sock_sendmsg+0x68/0x80 [ 1503.156708] ___sys_sendmsg+0x431/0x4b0 [ 1503.156799] __sys_sendmsg+0xa4/0x130 [ 1503.156889] do_syscall_64+0x171/0x3f0 [ 1503.156980] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 1503.157158] The buggy address belongs to the object at ffff8800298ab600 which belongs to the cache kmalloc-1024 of size 1024 [ 1503.157444] The buggy address is located 176 bytes inside of 1024-byte region [ffff8800298ab600, ffff8800298aba00) [ 1503.157702] The buggy address belongs to the page: [ 1503.157820] page:ffffea0000a62a00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0 [ 1503.158053] flags: 0x4000000000008100(slab|head) [ 1503.158171] raw: 4000000000008100 0000000000000000 0000000000000000 00000001800e000e [ 1503.158350] raw: dead000000000100 dead000000000200 ffff880036002600 0000000000000000 [ 1503.158523] page dumped because: kasan: bad access detected [ 1503.158698] Memory state around the buggy address: [ 1503.158816] ffff8800298ab900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.158988] ffff8800298ab980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 1503.159165] >ffff8800298aba00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 1503.159338] ^ [ 1503.159436] ffff8800298aba80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159610] ffff8800298abb00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 1503.159785] ================================================================== [ 1503.159964] Disabling lock debugging due to kernel taint The test scenario to trigger the issue consists of 4 devices: - H0: data sender, connected to LAN0 - H1: data receiver, connected to LAN1 - GW0 and GW1: routers between LAN0 and LAN1. Both of them have an ethernet connection on LAN0 and LAN1 On H{0,1} set GW0 as default gateway while on GW0 set GW1 as next hop for data from LAN0 to LAN1. Moreover create an ip6ip6 tunnel between H0 and H1 and send 3 concurrent data streams (TCP/UDP/SCTP) from H0 to H1 through ip6ip6 tunnel (send buffer size is set to 16K). While data streams are active flush the route cache on HA multiple times. I have not been able to identify a given commit that introduced the issue since, using the reproducer described above, the kasan report has been triggered from 4.14 and I have not gone back further. Reported-by: Jianlin Shi <jishi@redhat.com> Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'hv_netvsc-fix-multicast-flags-and-sync'David S. Miller2018-03-083-7/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen Hemminger says: ==================== hv_netvsc: fix multicast flags and sync This set of patches deals with the handling of multicast flags and addresses in transparent VF mode. The recent set of patches (in linux-net) had a couple of bugs. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * hv_netvsc: fix locking during VF setupStephen Hemminger2018-03-081-0/+4
| | | | | | | | | | | | | | | | | | The dev_uc/mc_sync calls need to have the device address list locked. This was spotted by running with lockdep enabled. Fixes: bee9d41b37ea ("hv_netvsc: propagate rx filters to VF") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * hv_netvsc: fix locking for rx_modeStephen Hemminger2018-03-081-3/+8
| | | | | | | | | | | | | | | | | | | | The rx_mode operation handler is different than other callbacks in that is not always called with rtnl held. Therefore use RCU to ensure that references are valid. Fixes: bee9d41b37ea ("hv_netvsc: propagate rx filters to VF") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * hv_netvsc: avoid repeated updates of packet filterStephen Hemminger2018-03-082-2/+7
| | | | | | | | | | | | | | | | | | | | | | The netvsc driver can get repeated calls to netvsc_rx_mode during network setup; each of these calls ends up scheduling the lower layers to update tha packet filter. This update requires an request/response to the host. So avoid doing this if we already know that the correct packet filter value is set. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * hv_netvsc: fix filter flagsStephen Hemminger2018-03-081-2/+2
|/ | | | | | | | | The recent change to not always enable all multicast and broadcast was broken; meant to set filter, not change flags. Fixes: 009f766ca238 ("hv_netvsc: filter multicast/broadcast") Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge tag 'wireless-drivers-for-davem-2018-03-08' of ↵David S. Miller2018-03-0825-88/+212
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers Kalle Valo: ==================== wireless-drivers fixes for 4.16 Quote a few fixes as I have not been able to send a pull request earlier. Most of the fixes for iwlwifi but also few others, nothing really standing out though. iwlwifi * fix a bogus warning when freeing a TFD * fix severe throughput problem with 9000 series * fix for a bug that caused queue hangs in certain situations * fix for an issue with IBSS * fix an issue with rate-scaling in AP-mode * fix Channel Switch Announcement (CSA) issues with count 0 and 1 * some firmware debugging fixes * remov a wrong error message when removing keys * fix a firmware sysassert most usually triggered in IBSS * a couple of fixes on multicast queues * a fix with CCMP 256 rtlwifi * fix loss of signal for rtl8723be brcmfmac * add possibility to obtain firmware error * fix P2P_DEVICE ethernet address generation ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * brcmfmac: fix P2P_DEVICE ethernet address generationArend Van Spriel2018-03-071-13/+11Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The firmware has a requirement that the P2P_DEVICE address should be different from the address of the primary interface. When not specified by user-space, the driver generates the MAC address for the P2P_DEVICE interface using the MAC address of the primary interface and setting the locally administered bit. However, the MAC address of the primary interface may already have that bit set causing the creation of the P2P_DEVICE interface to fail with -EBUSY. Fix this by using a random address instead to determine the P2P_DEVICE address. Cc: stable@vger.kernel.org # 3.10.y Reported-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
| * brcmfmac: add possibility to obtain firmware errorArend Van Spriel2018-03-073-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The feature module needs to evaluate the actual firmware error return upon a control command. This adds a flag to struct brcmf_if that the caller can set. This flag is checked to determine the error code that needs to be returned. Fixes: b69c1df47281 ("brcmfmac: separate firmware errors from i/o errors") Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
| * Merge tag 'iwlwifi-for-kalle-2018-03-02' of ↵Kalle Valo2018-03-0614-27/+125
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes Second batch of iwlwifi fixes intended for 4.16: * Fix CSA issues with count 0 and 1; * Some firmware debugging fixes; * Removed a wrong error message when removing keys; * Fix a firmware sysassert most usually triggered in IBSS; * A couple of fixes on multicast queues; * A fix with CCMP 256;
| | * iwlwifi: fix malformed CONFIG_IWLWIFI_PCIE_RTPM defaultUlf Magnusson2018-03-021-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'default false' should be 'default n', though they happen to have the same effect here, due to undefined symbols ('false' in this case) evaluating to n in a tristate sense. Remove the default instead of changing it. bool and tristate symbols implicitly default to n. Discovered with the https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py script. Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: Correctly set the tid for mcast queueIlan Peer2018-03-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the scheduler config command, the meaning of tid == 0xf was intended to indicate the configuration is for management frames. However, tid == 0xf was also used for the multicast queue that was meant only for multicast data frames, which resulted with the FW not encrypting multicast data frames. As multicast frames do not have a QoS header, fix this by setting tid == 0, to indicate that this is a data queue and not management one. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: Direct multicast frames to the correct stationIlan Peer2018-03-021-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multicast frames for NL80211_IFTYPE_AP and NL80211_IFTYPE_ADHOC were directed to the broadcast station, however, as the broadcast station did not have keys configured, these frames were sent unencrypted. Fix this by using the multicast station which is the station for which encryption keys are configured. Signed-off-by: Ilan Peer <ilan.peer@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: fix "failed to remove key" messageSara Sharon2018-03-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the GTK is installed, we install it to HW with the station ID of the AP. Mac80211 will try to remove it only after the AP sta is removed, which will result in a failure to remove key since we do not have any station for it. This is a valid situation, but a previous commit removed the early return and added a return with error value, which resulted in an error message that is confusing to users. Remove the error return value. Fixes: 85aeb58cec1a ("iwlwifi: mvm: Enable security on new TX API") Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: avoid collecting firmware dump if not loadedShaul Triebitz2018-03-025-5/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to collect firmware debug data while firmware is not loaded causes various errors (e.g. failing NIC access). This causes even a bigger issue if at that time the HW radio is off. In that case, when later turning the radio on, the Driver fails to read the HW (registers contain garbage values). (It may be that the CSR_GP_CNTRL_REG_FLAG_RFKILL_WAKE_L1A_EN bit is cleared on faulty NIC access - since the same behavior was seen in HW RFKILL toggling before setting that bit.) Signed-off-by: Shaul Triebitz <shaul.triebitz@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: fix assert 0x2B00 on older FWsSara Sharon2018-03-021-10/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should add the multicast station before adding the broadcast station. However, in older FW, the firmware will start beaconing when we add the multicast station, and since the broadcast station is not added at this point so the transmission of the beacon will fail on assert 0x2b00. This is fixed in later firmware, so make the order of addition depend on the TLV. Fixes: 26d6c16bed53 ("iwlwifi: mvm: add multicast station") Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: Fix channel switch for count 0 and 1Andrei Otcheretianski2018-03-022-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was assumed that apply_time==0 implies immediate scheduling, which is wrong. Instead, the fw expects the START_IMMEDIATELY flag to be set. Otherwise, this resulted in 0x3063 assert. Fix that. While at it rename the T2_V2_START_IMMEDIATELY to TE_V2_START_IMMEDIATELY. Fixes: f5d8f50f271d ("iwlwifi: mvm: Fix channel switch in case of count <= 1") Signed-off-by: Andrei Otcheretianski <andrei.otcheretianski@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: mvm: fix TX of CCMP 256Sara Sharon2018-03-021-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have enough room in the TX command for a CCMP 256 key, and need to use key from table. Fixes: 3264bf032bd9 ("[BUGFIX] iwlwifi: mvm: Fix CCMP IV setting") Signed-off-by: Sara Sharon <sara.sharon@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
| | * iwlwifi: Cancel and set MARKER_CMD timer during suspend-resumeHaim Dreyfuss2018-03-024-0/+42
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | While entering to D3 mode there is a gap between the time the driver handles the D3_CONFIG_CMD response to the time the host is going to sleep. In between there might be cases which MARKER_CMD can tailgate. Also during resume flow the MARKER_CMD might get sent while D0I3_CMD is being handled in the FW. Cancel MARKER_CMD timer and set it again properly during suspend resume flows to prevent this command from being sent accidentlly. Signed-off-by: Haim Dreyfuss <haim.dreyfuss@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>