summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* xprtrdma: Fix BUG after a device removalChuck Lever2018-02-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Michal Kalderon reports a BUG that occurs just after device removal: [ 169.112490] rpcrdma: removing device qedr0 for 192.168.110.146:20049 [ 169.143909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 169.181837] IP: rpcrdma_dma_unmap_regbuf+0xa/0x60 [rpcrdma] The RPC/RDMA client transport attempts to allocate some resources on demand. Registered buffers are one such resource. These are allocated (or re-allocated) by xprt_rdma_allocate to hold RPC Call and Reply messages. A hardware resource is associated with each of these buffers, as they can be used for a Send or Receive Work Request. If a device is removed from under an NFS/RDMA mount, the transport layer is responsible for releasing all hardware resources before the device can be finally unplugged. A BUG results when the NFS mount hasn't yet seen much activity: the transport tries to release resources that haven't yet been allocated. rpcrdma_free_regbuf() already checks for this case, so just move that check to cover the DEVICE_REMOVAL case as well. Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com> Fixes: bebd031866ca ("xprtrdma: Support unplugging an HCA ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Michal Kalderon <Michal.Kalderon@cavium.com> Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix calculation of ri_max_send_sgesChuck Lever2018-02-022-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs") introduced the rpcrdma_ia::ri_max_send_sges field. This fixes a problem where xprtrdma would not work if the device's max_sge capability was small (low single digits). At least RPCRDMA_MIN_SEND_SGES are needed for the inline parts of each RPC. ri_max_send_sges is set to this value: ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES; Then when marshaling each RPC, rpcrdma_args_inline uses that value to determine whether the device has enough Send SGEs to convey an NFS WRITE payload inline, or whether instead a Read chunk is required. More recently, commit ae72950abf99 ("xprtrdma: Add data structure to manage RDMA Send arguments") used the ri_max_send_sges value to calculate the size of an array, but that commit erroneously assumed ri_max_send_sges contains a value similar to the device's max_sge, and not one that was reduced by the minimum SGE count. This assumption results in the calculated size of the sendctx's Send SGE array to be too small. When the array is used to marshal an RPC, the code can write Send SGEs into the following sendctx element in that array, corrupting it. When the device's max_sge is large, this issue is entirely harmless; but it results in an oops in the provider's post_send method, if dev.attrs.max_sge is small. So let's straighten this out: ri_max_send_sges will now contain a value with the same meaning as dev.attrs.max_sge, which makes the code easier to understand, and enables rpcrdma_sendctx_create to calculate the size of the SGE array correctly. Reported-by: Michal Kalderon <Michal.Kalderon@cavium.com> Fixes: 16f906d66cd7 ("xprtrdma: Reduce required number of send SGEs") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Michal Kalderon <Michal.Kalderon@cavium.com> Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Micro-optimize __rpc_executeChuck Lever2018-01-231-13/+11Star
| | | | | | | | | | | | | The common case: There are 13 to 14 actions per RPC, and tk_callback is non-NULL in only one of them. There's no need to store a NULL in the tk_callback field during each FSM step. This slightly improves throughput results in dbench and other multi- threaded benchmarks on my two-socket client on 56Gb InfiniBand, but will probably be inconsequential on slower systems. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: task_run_action should display tk_callbackChuck Lever2018-01-231-1/+1
| | | | | | | | | | | | | | | | | This shows up in every RPC: kworker/4:1-19772 [004] 3467.373443: rpc_task_run_action: task:4711@2 flags=0e81 state=0005 status=0 action=call_status kworker/4:1-19772 [004] 3467.373444: rpc_task_run_action: task:4711@2 flags=0e81 state=0005 status=0 action=call_status What's actually going on is that the first iteration of the RPC scheduler is invoking the function in tk_callback (in this case, xprt_timer), then invoking call_status on the next iteration. Feeding do_action, rather than tk_action, to the "task_run_action" trace point will now always display the correct FSM step. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* SUNRPC: Trace xprt_timer eventsChuck Lever2018-01-232-3/+1Star
| | | | | | | | Track RPC timeouts: report the XID and the server address to match the content of network capture. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Correct some documenting commentsChuck Lever2018-01-232-2/+2
| | | | | | | | | | | | | Fix kernel-doc warnings in net/sunrpc/xprtrdma/ . net/sunrpc/xprtrdma/verbs.c:1575: warning: No description found for parameter 'count' net/sunrpc/xprtrdma/verbs.c:1575: warning: Excess function parameter 'min_reqs' description in 'rpcrdma_ep_post_extra_recv' net/sunrpc/xprtrdma/backchannel.c:288: warning: No description found for parameter 'r_xprt' net/sunrpc/xprtrdma/backchannel.c:288: warning: Excess function parameter 'xprt' description in 'rpcrdma_bc_receive_call' Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix "bytes registered" accountingChuck Lever2018-01-231-2/+2
| | | | | | | | | | | The contents of seg->mr_len changed when ->ro_map stopped returning the full chunk length in the first segment. Count the full length of each Write chunk, not the length of the first segment (which now can only be as large as a page). Fixes: 9d6b04097882 ("xprtrdma: Place registered MWs on a ... ") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Instrument allocation/release of rpcrdma_req/rep objectsChuck Lever2018-01-232-9/+7Star
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points to instrument QP and CQ access upcallsChuck Lever2018-01-231-0/+3
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in the client-side backchannel code pathsChuck Lever2018-01-232-5/+5
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points for connect eventsChuck Lever2018-01-232-28/+17Star
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points to instrument MR allocation and recoveryChuck Lever2018-01-231-3/+3
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points to instrument memory invalidationChuck Lever2018-01-233-16/+16
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in reply decoder pathChuck Lever2018-01-231-20/+9Star
| | | | | | | | This includes decoding Write and Reply chunks, and fixing up inline payloads. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points to instrument memory registrationChuck Lever2018-01-232-22/+7Star
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in the RPC Reply handler pathsChuck Lever2018-01-232-19/+9Star
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Add trace points in RPC Call transmit pathsChuck Lever2018-01-232-15/+5Star
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* rpcrdma: infrastructure for static trace points in rpcrdma.koChuck Lever2018-01-232-5/+9
| | | | | Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Introduce rpcrdma_mw_unmap_and_putChuck Lever2018-01-164-23/+33
| | | | | | | | | Clean up: Code review suggested that a common bit of code can be placed into a helper function, and this gives us fewer places to stick an "I DMA unmapped something" trace point. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove usage of "mw"Chuck Lever2018-01-165-278/+292
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: struct rpcrdma_mw was named after Memory Windows, but xprtrdma no longer supports a Memory Window registration mode. Rename rpcrdma_mw and its fields to reduce confusion and make the code more sensible to read. Renaming "mw" was suggested by Tom Talpey, the author of the original xprtrdma implementation. It's a good idea, but I haven't done this until now because it's a huge diffstat for no benefit other than code readability. However, I'm about to introduce static trace points that expose a few of xprtrdma's internal data structures. They should make sense in the trace report, and it's reasonable to treat trace points as a kernel API contract which might be difficult to change later. While I'm churning things up, two additional changes: - rename variables unhelpfully called "r" to "mr", to improve code clarity, and - rename the MR-related helper functions using the form "rpcrdma_mr_<verb>", to be consistent with other areas of the code. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Replace all usage of "frmr" with "frwr"Chuck Lever2018-01-164-99/+99
| | | | | | | | | | | | | | | Clean up: Over time, the industry has adopted the term "frwr" instead of "frmr". The term "frwr" is now more widely recognized. For the past couple of years I've attempted to add new code using "frwr" , but there still remains plenty of older code that still uses "frmr". Replace all usage of "frmr" to avoid confusion. While we're churning code, rename variables unhelpfully called "f" to "frwr", to improve code clarity. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Don't clear RPC_BC_PA_IN_USE on pre-allocated rpc_rqst'sChuck Lever2018-01-161-6/+1Star
| | | | | | | | | | No need for the overhead of atomically setting and clearing this bit flag for every use of a pre-allocated backchannel rpc_rqst. These are a distinct pool of rpc_rqsts that are used only for callback operations, so it is safe to simply leave the bit set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Split xprt_rdma_send_requestChuck Lever2018-01-164-32/+50
| | | | | | | | | | | | Clean up. @rqst is set up differently for backchannel Replies. For example, rqst->rq_task and task->tk_client are both NULL. So it is easier to understand and maintain this code path if it is separated. Also, we can get rid of the confusing rl_connect_cookie hack in rpcrdma_bc_receive_call. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: buf_free not called for CB repliesChuck Lever2018-01-163-6/+1Star
| | | | | | | | | | | | | | Since commit 5a6d1db45569 ("SUNRPC: Add a transport-specific private field in rpc_rqst"), the rpc_rqst's for RPC-over-RDMA backchannel operations leave rq_buffer set to NULL. xprt_release does not invoke ->op->buf_free when rq_buffer is NULL. The RPCRDMA_REQ_F_BACKCHANNEL check in xprt_rdma_free is therefore redundant because xprt_rdma_free is not invoked for backchannel requests. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Move unmap-safe logic to rpcrdma_marshal_reqChuck Lever2018-01-162-5/+11
| | | | | | | | | Clean up. This logic is related to marshaling the request, and I'd like to keep everything that touches req->rl_registered close together, for CPU cache efficiency. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Support IPv6 in xprt_rdma_set_portChuck Lever2018-01-161-4/+24
| | | | | | | | | | | | Clean up a harmless oversight. xprtrdma's ->set_port method has never properly supported IPv6. This issue has never been a problem because NFS/RDMA mounts have always required "port=20049", thus so far, rpcbind is not invoked for these mounts. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove another sockaddr_storage field (cdata::addr)Chuck Lever2018-01-163-19/+12Star
| | | | | | | | | | Save more space in struct rpcrdma_xprt by removing the redundant "addr" field from struct rpcrdma_create_data_internal. Wherever we have rpcrdma_xprt, we also have the rpc_xprt, which has a sockaddr_storage field with the same content. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Initialize the xprt address string array earlierChuck Lever2018-01-163-13/+21
| | | | | | | | | | | This makes the address strings available for debugging messages in earlier stages of transport set up. The first benefit is to get rid of the single-use rep_remote_addr field, saving 128+ bytes in struct rpcrdma_ep. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove unused padding variablesChuck Lever2018-01-162-7/+3Star
| | | | | | | | | Clean up. Remove fields that should have been removed by commit b3221d6a53c4 ("xprtrdma: Remove logic that constructs RDMA_MSGP type calls"). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Remove ri_reminv_expectedChuck Lever2018-01-162-3/+0Star
| | | | | | | | | | Clean up. Commit b5f0afbea4f2 ("xprtrdma: Per-connection pad optimization") should have removed this. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Per-mode handling for Remote InvalidationChuck Lever2018-01-164-30/+27Star
| | | | | | | | | | | | Refactoring change: Remote Invalidation is particular to the memory registration mode that is use. Use a callout instead of a generic function to handle Remote Invalidation. This gets rid of the 8-byte flags field in struct rpcrdma_mw, of which only a single bit flag has been allocated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Eliminate unnecessary lock cycle in xprt_rdma_send_requestChuck Lever2018-01-161-1/+1
| | | | | | | | | | The rpcrdma_req is not shared yet, and its associated Send hasn't been posted, thus RMW should be safe. There's no need for the expense of a lock cycle here. Fixes: 0ba6f37012db ("xprtrdma: Refactor rpcrdma_deferred_completion") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix backchannel allocation of extra rpcrdma_repsChuck Lever2018-01-163-24/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | The backchannel code uses rpcrdma_recv_buffer_put to add new reps to the free rep list. This also decrements rb_recv_count, which spoofs the receive overrun logic in rpcrdma_buffer_get_rep. Commit 9b06688bc3b9 ("xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock)") replaced the original open-coded list_add with a call to rpcrdma_recv_buffer_put(), but then a year later, commit 05c974669ece ("xprtrdma: Fix receive buffer accounting") added rep accounting to rpcrdma_recv_buffer_put. It was an oversight to let the backchannel continue to use this function. The fix this, let's combine the "add to free list" logic with rpcrdma_create_rep. Also, do not allocate RPCRDMA_MAX_BC_REQUESTS rpcrdma_reps in rpcrdma_buffer_create and then allocate additional rpcrdma_reps in rpcrdma_bc_setup_reps. Allocating the extra reps during backchannel set-up is sufficient. Fixes: 05c974669ece ("xprtrdma: Fix receive buffer accounting") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* xprtrdma: Fix buffer leak after transport set up failureChuck Lever2018-01-161-11/+2Star
| | | | | | | | | | | | | | | This leak has been around forever, and is exceptionally rare. EINVAL causes mount to fail with "an incorrect mount option was specified" although it's not likely that one of the mount options is incorrect. Instead, return ENODEV in this case, as this appears to be an issue with system or device configuration rather than a specific mount option. Some obsolete comments are also removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* Merge tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds2017-12-165-15/+24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client fixes from Anna Schumaker: "This has two stable bugfixes, one to fix a BUG_ON() when nfs_commit_inode() is called with no outstanding commit requests and another to fix a race in the SUNRPC receive codepath. Additionally, there are also fixes for an NFS client deadlock and an xprtrdma performance regression. Summary: Stable bugfixes: - NFS: Avoid a BUG_ON() in nfs_commit_inode() by not waiting for a commit in the case that there were no commit requests. - SUNRPC: Fix a race in the receive code path Other fixes: - NFS: Fix a deadlock in nfs client initialization - xprtrdma: Fix a performance regression for small IOs" * tag 'nfs-for-4.15-3' of git://git.linux-nfs.org/projects/anna/linux-nfs: SUNRPC: Fix a race in the receive code path nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests xprtrdma: Spread reply processing over more CPUs nfs: fix a deadlock in nfs client initialization
| * SUNRPC: Fix a race in the receive code pathTrond Myklebust2017-12-151-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever <chuck.lever@oracle.com> Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Cc: stable@vger.kernel.org # 4.14+ Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Spread reply processing over more CPUsChuck Lever2017-12-154-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d20ee4 ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2017-12-1545-128/+329
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) Clamp timeouts to INT_MAX in conntrack, from Jay Elliot. 2) Fix broken UAPI for BPF_PROG_TYPE_PERF_EVENT, from Hendrik Brueckner. 3) Fix locking in ieee80211_sta_tear_down_BA_sessions, from Johannes Berg. 4) Add missing barriers to ptr_ring, from Michael S. Tsirkin. 5) Don't advertise gigabit in sh_eth when not available, from Thomas Petazzoni. 6) Check network namespace when delivering to netlink taps, from Kevin Cernekee. 7) Kill a race in raw_sendmsg(), from Mohamed Ghannam. 8) Use correct address in TCP md5 lookups when replying to an incoming segment, from Christoph Paasch. 9) Add schedule points to BPF map alloc/free, from Eric Dumazet. 10) Don't allow silly mtu values to be used in ipv4/ipv6 multicast, also from Eric Dumazet. 11) Fix SKB leak in tipc, from Jon Maloy. 12) Disable MAC learning on OVS ports of mlxsw, from Yuval Mintz. 13) SKB leak fix in skB_complete_tx_timestamp(), from Willem de Bruijn. 14) Add some new qmi_wwan device IDs, from Daniele Palmas. 15) Fix static key imbalance in ingress qdisc, from Jiri Pirko. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (76 commits) net: qcom/emac: Reduce timeout for mdio read/write net: sched: fix static key imbalance in case of ingress/clsact_init error net: sched: fix clsact init error path ip_gre: fix wrong return value of erspan_rcv net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support pkt_sched: Remove TC_RED_OFFLOADED from uapi net: sched: Move to new offload indication in RED net: sched: Add TCA_HW_OFFLOAD net: aquantia: Increment driver version net: aquantia: Fix typo in ethtool statistics names net: aquantia: Update hw counters on hw init net: aquantia: Improve link state and statistics check interval callback net: aquantia: Fill in multicast counter in ndev stats from hardware net: aquantia: Fill ndev stat couters from hardware net: aquantia: Extend stat counters to 64bit values net: aquantia: Fix hardware DMA stream overload on large MRRS net: aquantia: Fix actual speed capabilities reporting sock: free skb in skb_complete_tx_timestamp on error s390/qeth: update takeover IPs after configuration change s390/qeth: lock IP table while applying takeover changes ...
| * | net: sched: fix static key imbalance in case of ingress/clsact_init errorJiri Pirko2017-12-151-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move static key increments to the beginning of the init function so they pair 1:1 with decrements in ingress/clsact_destroy, which is called in case ingress/clsact_init fails. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sched: fix clsact init error pathJiri Pirko2017-12-152-7/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since in qdisc_create, the destroy op is called when init fails, we don't do cleanup in init and leave it up to destroy. This fixes use-after-free when trying to put already freed block. Fixes: 6e40cf2d4dee ("net: sched: use extended variants of block_get/put in ingress and clsact qdiscs") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ip_gre: fix wrong return value of erspan_rcvHaishuang Yan2017-12-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If pskb_may_pull return failed, return PACKET_REJECT instead of -ENOMEM. Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Cc: William Tu <u9012063@gmail.com> Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sched: Move to new offload indication in REDYuval Mintz2017-12-151-16/+15Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let RED utilize the new internal flag, TCQ_F_OFFLOADED, to mark a given qdisc as offloaded instead of using a dedicated indication. Also, change internal logic into looking at said flag when possible. Fixes: 602f3baf2218 ("net_sch: red: Add offload ability to RED qdisc") Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sched: Add TCA_HW_OFFLOADYuval Mintz2017-12-151-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Qdiscs can be offloaded to HW, but current implementation isn't uniform. Instead, qdiscs either pass information about offload status via their TCA_OPTIONS or omit it altogether. Introduce a new attribute - TCA_HW_OFFLOAD that would form a uniform uAPI for the offloading status of qdiscs. Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | sock: free skb in skb_complete_tx_timestamp on errorWillem de Bruijn2017-12-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | skb_complete_tx_timestamp must ingest the skb it is passed. Call kfree_skb if the skb cannot be enqueued. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()") Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | Merge tag 'batadv-net-for-davem-20171215' of git://git.open-mesh.org/linux-mergeDavid S. Miller2017-12-154-5/+7
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Simon Wunderlich says: ==================== Here are some batman-adv bugfixes: - Initialize the fragment headers, by Sven Eckelmann - Fix a NULL check in BATMAN V, by Sven Eckelmann - Fix kernel doc for the time_setup() change, by Sven Eckelmann - Use the right lock in BATMAN IV OGM Update, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | batman-adv: Fix lock for ogm cnt access in batadv_iv_ogm_calc_tqSven Eckelmann2017-12-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The originator node object orig_neigh_node is used to when accessing the bcast_own(_sum) and real_packet_count information. The access to them has to be protected with the spinlock in orig_neigh_node. But the function uses the lock in orig_node instead. This is incorrect because they could be two different originator node objects. Fixes: 0ede9f41b217 ("batman-adv: protect bit operations to count OGMs with spinlock") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| | * | batman-adv: Fix kernel-doc for timer functionsSven Eckelmann2017-12-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit e99e88a9d2b0 ("treewide: setup_timer() -> timer_setup()") changed the argument name and type of the timer function but didn't adjust the kernel-doc of these functions. Signed-off-by: Sven Eckelmann <sven@narfation.org> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| | * | batman-adv: Fix check of retrieved orig_gw in batadv_v_gw_is_eligibleSven Eckelmann2017-12-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The batadv_v_gw_is_eligible function already assumes that orig_node is not NULL. But batadv_gw_node_get may have failed to find the originator. It must therefore be checked whether the batadv_gw_node_get failed and not whether orig_node is NULL to detect this error. Fixes: 50164d8f500f ("batman-adv: B.A.T.M.A.N. V - implement GW selection logic") Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Acked-by: Antonio Quartulli <a@unstable.cc> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| | * | batman-adv: Always initialize fragment header prioritySven Eckelmann2017-12-011-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The batman-adv unuicast fragment header contains 3 bits for the priority of the packet. These bits will be initialized when the skb->priority contains a value between 256 and 263. But otherwise, the uninitialized bits from the stack will be used. Fixes: c0f25c802b33 ("batman-adv: Include frame priority in fragment header") Signed-off-by: Sven Eckelmann <sven.eckelmann@open-mesh.com> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
| * | | tcp: refresh tcp_mstamp from timers callbacksEric Dumazet2017-12-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Mike Maloney <maloney@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Mike Maloney <maloney@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>