summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * rxrpc: Fix uninitialised variable warningDavid Howells2016-09-021-3/+2Star
| | | | | | | | | | | | | | | | | | | | | | Fix the following uninitialised variable warning: ../net/rxrpc/call_event.c: In function 'rxrpc_process_call': ../net/rxrpc/call_event.c:879:58: warning: 'error' may be used uninitialized in this function [-Wmaybe-uninitialized] _debug("post net error %d", error); ^ Signed-off-by: David Howells <dhowells@redhat.com>
| * rxrpc: fix undefined behavior in rxrpc_mark_call_releasedArnd Bergmann2016-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc -Wmaybe-initialized correctly points out a newly introduced bug through which we can end up calling rxrpc_queue_call() for a dead connection: net/rxrpc/call_object.c: In function 'rxrpc_mark_call_released': net/rxrpc/call_object.c:600:5: error: 'sched' may be used uninitialized in this function [-Werror=maybe-uninitialized] This sets the 'sched' variable to zero to restore the previous behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: f5c17aaeb2ae ("rxrpc: Calls should only have one terminal state") Signed-off-by: David Howells <dhowells@redhat.com>
* | vxlan: Update tx_errors statistics if vxlan_build_skb return err.Haishuang Yan2016-09-061-0/+1
| | | | | | | | | | | | | | | | If vxlan_build_skb return err < 0, tx_errors should be also increased. Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net/mlx4_en: protect ring->xdp_prog with rcu_read_lockBrenden Blanco2016-09-063-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Depending on the preempt mode, the bpf_prog stored in xdp_prog may be freed despite the use of call_rcu inside bpf_prog_put. The situation is possible when running in PREEMPT_RCU=y mode, for instance, since the rcu callback for destroying the bpf prog can run even during the bh handling in the mlx4 rx path. Several options were considered before this patch was settled on: Add a napi_synchronize loop in mlx4_xdp_set, which would occur after all of the rings are updated with the new program. This approach has the disadvantage that as the number of rings increases, the speed of update will slow down significantly due to napi_synchronize's msleep(1). Add a new rcu_head in bpf_prog_aux, to be used by a new bpf_prog_put_bh. The action of the bpf_prog_put_bh would be to then call bpf_prog_put later. Those drivers that consume a bpf prog in a bh context (like mlx4) would then use the bpf_prog_put_bh instead when the ring is up. This has the problem of complexity, in maintaining proper refcnts and rcu lists, and would likely be harder to review. In addition, this approach to freeing must be exclusive with other frees of the bpf prog, for instance a _bh prog must not be referenced from a prog array that is consumed by a non-_bh prog. The placement of rcu_read_lock in this patch is functionally the same as putting an rcu_read_lock in napi_poll. Actually doing so could be a potentially controversial change, but would bring the implementation in line with sk_busy_loop (though of course the nature of those two paths is substantially different), and would also avoid future copy/paste problems with future supporters of XDP. Still, this patch does not take that opinionated option. Testing was done with kernels in either PREEMPT_RCU=y or CONFIG_PREEMPT_VOLUNTARY=y+PREEMPT_RCU=n modes, with neither exhibiting any drawback. With PREEMPT_RCU=n, the extra call to rcu_read_lock did not show up in the perf report whatsoever, and with PREEMPT_RCU=y the overhead of rcu_read_lock (according to perf) was the same before/after. In the rx path, rcu_read_lock is eventually called for every packet from netif_receive_skb_internal, so the napi poll call's rcu_read_lock is easily amortized. v2: Remove extra rcu_read_lock in mlx4_en_process_rx_cq body Annotate xdp_prog with __rcu, and convert all usages to rcu_assign or rcu_dereference[_protected] as appropriate. Add explicit mutex lock around rcu_assign instead of xchg loop. Fixes: d576acf0a22 ("net/mlx4_en: add page recycle to prepare rx ring for tx support") Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Brenden Blanco <bblanco@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'mediatek-rx-path-enhancements'David S. Miller2016-09-061-11/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sean Wang says: ==================== net: ethernet: mediatek: add enhancements to RX path Changes since v1: - fix message typos and add coverletter Changes since v2: - split from the previous series for submitting add enhancements as a series targeting 'net-next' and add indents before comments. Changes since v3: - merge the patch using PDMA RX path - fixed the input of mtk_poll_rx is with the remaining budget Changes since v4: - save one wmb and register update when no packet is being handled inside mtk_poll_rx call - fixed incorrect return packet count from mtk_napi_rx ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPISean Wang2016-09-061-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch adds support for aggregating more SKBs feed into NAPI in order to get more benefits from generic receive offload (GRO) by peeking at the RX ring status and moving more packets right before returning from NAPI RX polling handler if NAPI budgets are still available and some packets already present in hardware. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ethernet: mediatek: enhance RX path by reducing the frequency of the ↵Sean Wang2016-09-061-5/+6
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | memory barrier used The patch makes move wmb() to outside the loop that could help RX path handling more faster although that RX descriptors aren't freed for DMA to use as soon as possible, but based on my experiment and the result shows it still can reach about 943Mbpis without performance drop that is tested based on the setup with one port using Giga PHY and 256 RX descriptors for DMA to move. Signed-off-by: Sean Wang <sean.wang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'hso-neatening'David S. Miller2016-09-061-63/+55Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Joe Perches says: ==================== hso: neatening This seems to be the only code in the kernel that uses macro defines with a trailing underscore. Fix that. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | hso: Convert printk to pr_<level>Joe Perches2016-09-061-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a more common logging style Miscellanea: o Add pr_fmt to prefix each output message o Realign arguments Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | hso: Use a more common logging styleJoe Perches2016-09-061-53/+44Star
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | Macros that end in an underscore are just odd. Add hso_dbg(level, fmt, ...) and use it everwhere instead. Several uses had additional unnecessary newlines as the macro added a newline. Remove the newline from the macro and add newlines to each use as appropriate. Remove now unused D<digit> macros. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add mdix control via ethtoolWoojung Huh2016-09-061-3/+106
| | | | | | | | | | | | | | Add mdix control through ethtool. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add register defineWoojung Huh2016-09-061-0/+8
| | | | | | | | | | | | | | Add STRAP_STATUS defines. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | smsc95xx: Add maintainerWoojung Huh2016-09-061-0/+1
| | | | | | | | | | | | | | | | | | Add Microchip Linux Driver Support as maintainer because this driver is maintaining by Microchip. Signed-off-by: Woojung Huh <Woojung.huh@gmail.com> Acked-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'mv88e6xxx-isolate-Global2'David S. Miller2016-09-066-451/+596
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vivien Didelot says: ==================== net: dsa: mv88e6xxx: isolate Global2 support Registers of Marvell chips are organized in internal SMI devices. One of them at address 0x1C is called Global2. It provides an extended set of registers, used for interrupt control, EEPROM access, indirect PHY access (to bypass the PHY Polling Unit) and cross-chip setup. Most chips have it, but some others don't (older ones such as 6060). Now that its related code is isolated in mv88e6xxx_g2_* functions, move it to its own global2.c file, making most of its setup code static. Then make its compilation optional, which allows to reduce the size of the mv88e6xxx driver for devices such as home routers embedding Ethernet chips without Global2 support. It is present on most recent chips, thus enable its support by default. Changes in v2: fail probe if GLOBAL2 is required but not enabled. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: make global2 code optionalVivien Didelot2016-09-064-1/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since not every chip has a Global2 set of registers, make its support optional, in which case the related functions will return -EOPNOTSUPP. This also allows to reduce the size of the mv88e6xxx driver for devices such as home routers embedding Ethernet chips without Global2 support. It is present on most recent chips, thus enable its support by default. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: move Global2 codeVivien Didelot2016-09-065-450/+521
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marvell chips are composed of multiple SMI devices. One of them at address 0x1C is called Global2. It provides an extended set of registers, used for interrupt control, EEPROM access, indirect PHY access (to bypass the PHY Polling Unit) and cross-chip related setup. Most chips have it, but some others don't (older ones such as 6060). Now that its related code is isolated in mv88e6xxx_g2_* functions, move it to its own global2.c file, making most of its setup code static. Document each registers in the meantime. Its compilation can be later avoided for chips without such registers. Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: dsa: mv88e6xxx: fix module namingVivien Didelot2016-09-061-1/+2
|/ / | | | | | | | | | | | | | | | | Since the mv88e6xxx.c file has been renamed, the driver compiled as a module is called chip.ko instead of mv88e6xxx.ko. Fix this. Fixes: fad09c73c270 ("net: dsa: mv88e6xxx: rename single-chip support") Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2016-09-0646-1613/+1380Star
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for your net-next tree. Most relevant updates are the removal of per-conntrack timers to use a workqueue/garbage collection approach instead from Florian Westphal, the hash and numgen expression for nf_tables from Laura Garcia, updates on nf_tables hash set to honor the NLM_F_EXCL flag, removal of ip_conntrack sysctl and many other incremental updates on our Netfilter codebase. More specifically, they are: 1) Retrieve only 4 bytes to fetch ports in case of non-linear skb transport area in dccp, sctp, tcp, udp and udplite protocol conntrackers, from Gao Feng. 2) Missing whitespace on error message in physdev match, from Hangbin Liu. 3) Skip redundant IPv4 checksum calculation in nf_dup_ipv4, from Liping Zhang. 4) Add nf_ct_expires() helper function and use it, from Florian Westphal. 5) Replace opencoded nf_ct_kill() call in IPVS conntrack support, also from Florian. 6) Rename nf_tables set implementation to nft_set_{name}.c 7) Introduce the hash expression to allow arbitrary hashing of selector concatenations, from Laura Garcia Liebana. 8) Remove ip_conntrack sysctl backward compatibility code, this code has been around for long time already, and we have two interfaces to do this already: nf_conntrack sysctl and ctnetlink. 9) Use nf_conntrack_get_ht() helper function whenever possible, instead of opencoding fetch of hashtable pointer and size, patch from Liping Zhang. 10) Add quota expression for nf_tables. 11) Add number generator expression for nf_tables, this supports incremental and random generators that can be combined with maps, very useful for load balancing purpose, again from Laura Garcia Liebana. 12) Fix a typo in a debug message in FTP conntrack helper, from Colin Ian King. 13) Introduce a nft_chain_parse_hook() helper function to parse chain hook configuration, this is used by a follow up patch to perform better chain update validation. 14) Add rhashtable_lookup_get_insert_key() to rhashtable and use it from the nft_set_hash implementation to honor the NLM_F_EXCL flag. 15) Missing nulls check in nf_conntrack from nf_conntrack_tuple_taken(), patch from Florian Westphal. 16) Don't use the DYING bit to know if the conntrack event has been already delivered, instead a state variable to track event re-delivery states, also from Florian. 17) Remove the per-conntrack timer, use the workqueue approach that was discussed during the NFWS, from Florian Westphal. 18) Use the netlink conntrack table dump path to kill stale entries, again from Florian. 19) Add a garbage collector to get rid of stale conntracks, from Florian. 20) Reschedule garbage collector if eviction rate is high. 21) Get rid of the __nf_ct_kill_acct() helper. 22) Use ARPHRD_ETHER instead of hardcoded 1 from ARP logger. 23) Make nf_log_set() interface assertive on unsupported families. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: log: Check param to avoid overflow in nf_log_setGao Feng2016-08-306-13/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | The nf_log_set is an interface function, so it should do the strict sanity check of parameters. Convert the return value of nf_log_set as int instead of void. When the pf is invalid, return -EOPNOTSUPP. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: log_arp: Use ARPHRD_ETHER instead of literal '1'Gao Feng2016-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | There is one macro ARPHRD_ETHER which defines the ethernet proto for ARP, so we could use it instead of the literal number '1'. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: remove __nf_ct_kill_acct helperFlorian Westphal2016-08-302-17/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | After timer removal this just calls nf_ct_delete so remove the __ prefix version and make nf_ct_kill a shorthand for nf_ct_delete. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: resched gc again if eviction rate is highFlorian Westphal2016-08-301-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we evicted a large fraction of the scanned conntrack entries re-schedule the next gc cycle for immediate execution. This triggers during tests where load is high, then drops to zero and many connections will be in TW/CLOSE state with < 30 second timeouts. Without this change it will take several minutes until conntrack count comes back to normal. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: add gc worker to remove timed-out entriesFlorian Westphal2016-08-301-0/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conntrack gc worker to evict stale entries. GC happens once every 5 seconds, but we only scan at most 1/64th of the table (and not more than 8k) buckets to avoid hogging cpu. This means that a complete scan of the table will take several minutes of wall-clock time. Considering that the gc run will never have to evict any entries during normal operation because those will happen from packet path this should be fine. We only need gc to make sure userspace (conntrack event listeners) eventually learn of the timeout, and for resource reclaim in case the system becomes idle. We do not disable BH and cond_resched for every bucket so this should not introduce noticeable latencies either. A followup patch will add a small change to speed up GC for the extreme case where most entries are timed out on an otherwise idle system. v2: Use cond_resched_rcu_qs & add comment wrt. missing restart on nulls value change in gc worker, suggested by Eric Dumazet. v3: don't call cancel_delayed_work_sync twice (again, Eric). Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: evict stale entries on netlink dumpsFlorian Westphal2016-08-301-1/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When dumping we already have to look at the entire table, so we might as well toss those entries whose timeout value is in the past. We also look at every entry during resize operations. However, eviction there is not as simple because we hold the global resize lock so we can't evict without adding a 'expired' list to drop from later. Considering that resizes are very rare it doesn't seem worth doing it. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: get rid of conntrack timerFlorian Westphal2016-08-305-63/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With stats enabled this eats 80 bytes on x86_64 per nf_conn entry, as Eric Dumazet pointed out during netfilter workshop 2016. Eric also says: "Another reason was the fact that Thomas was about to change max timer range [..]" (500462a9de657f8, 'timers: Switch to a non-cascading wheel'). Remove the timer and use a 32bit jiffies value containing timestamp until entry is valid. During conntrack lookup, even before doing tuple comparision, check the timeout value and evict the entry in case it is too old. The dying bit is used as a synchronization point to avoid races where multiple cpus try to evict the same entry. Because lookup is always lockless, we need to bump the refcnt once when we evict, else we could try to evict already-dead entry that is being recycled. This is the standard/expected way when conntrack entries are destroyed. Followup patches will introduce garbage colliction via work queue and further places where we can reap obsoleted entries (e.g. during netlink dumps), this is needed to avoid expired conntracks from hanging around for too long when lookup rate is low after a busy period. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: don't rely on DYING bit to detect when destroy event was sentFlorian Westphal2016-08-302-13/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The reliable event delivery mode currently (ab)uses the DYING bit to detect which entries on the dying list have to be skipped when re-delivering events from the eache worker in reliable event mode. Currently when we delete the conntrack from main table we only set this bit if we could also deliver the netlink destroy event to userspace. If we fail we move it to the dying list, the ecache worker will reattempt event delivery for all confirmed conntracks on the dying list that do not have the DYING bit set. Once timer is gone, we can no longer use if (del_timer()) to detect when we 'stole' the reference count owned by the timer/hash entry, so we need some other way to avoid racing with other cpu. Pablo suggested to add a marker in the ecache extension that skips entries that have been unhashed from main table but are still waiting for the last reference count to be dropped (e.g. because one skb waiting on nfqueue verdict still holds a reference). We do this by adding a tristate. If we fail to deliver the destroy event, make a note of this in the eache extension. The worker can then skip all entries that are in a different state. Either they never delivered a destroy event, e.g. because the netlink backend was not loaded, or redelivery took place already. Once the conntrack timer is removed we will now be able to replace del_timer() test with test_and_set_bit(DYING, &ct->status) to avoid racing with other cpu that tries to evict the same conntrack. Because DYING will then be set right before we report the destroy event we can no longer skip event reporting when dying bit is set. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: restart search if moved to other chainFlorian Westphal2016-08-301-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case nf_conntrack_tuple_taken did not find a conflicting entry check that all entries in this hash slot were tested and restart in case an entry was moved to another chain. Reported-by: Eric Dumazet <edumazet@google.com> Fixes: ea781f197d6a ("netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()") Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: Use nla_put_be32() to dump immediate parametersPablo Neira Ayuso2016-08-262-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | nft_dump_register() should only be used with registers, not with immediates. Fixes: cb1b69b0b15b ("netfilter: nf_tables: add hash expression") Fixes: 91dbc6be0a62("netfilter: nf_tables: add number generator expression") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: honor NLM_F_EXCL flag in set element insertionPablo Neira Ayuso2016-08-264-14/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the NLM_F_EXCL flag is set, then new elements that clash with an existing one return EEXIST. In case you try to add an element whose data area differs from what we have, then this returns EBUSY. If no flag is specified at all, then this returns success to userspace. This patch also update the set insert operation so we can fetch the existing element that clashes with the one you want to add, we need this to make sure the element data doesn't differ. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | rhashtable: add rhashtable_lookup_get_insert_key()Pablo Neira Ayuso2016-08-262-16/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies __rhashtable_insert_fast() so it returns the existing object that clashes with the one that you want to insert. In case the object is successfully inserted, NULL is returned. Otherwise, you get an error via ERR_PTR(). This patch adapts the existing callers of __rhashtable_insert_fast() so they handle this new logic, and it adds a new rhashtable_lookup_get_insert_key() interface to fetch this existing object. nf_tables needs this change to improve handling of EEXIST cases via honoring the NLM_F_EXCL flag and by checking if the data part of the mapping matches what we have. Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
| * | netfilter: nf_tables: reject hook configuration updates on existing chainsPablo Neira Ayuso2016-08-231-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, if you add a base chain whose name clashes with an existing non-base chain, nf_tables doesn't complain about this. Similarly, if you update the chain type, the hook number and priority. With this patch, nf_tables bails out in case any of this unsupported operations occur by returning EBUSY. # nft add table x # nft add chain x y # nft add chain x y { type nat hook input priority 0\; } <cmdline>:1:1-49: Error: Could not process rule: Device or resource busy add chain x y { type nat hook input priority 0; } ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: introduce nft_chain_parse_hook()Pablo Neira Ayuso2016-08-231-63/+89
| | | | | | | | | | | | | | | | | | | | | Introduce a new function to wrap the code that parses the chain hook configuration so we can reuse this code to validate chain updates. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: typo in trace attribute definitionPablo Neira2016-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | Should be attributes, instead of attibutes, for consistency with other definitions. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_hash: fix non static symbol warningWei Yongjun2016-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes the following sparse warning: net/netfilter/nft_hash.c:40:25: warning: symbol 'nft_hash_policy' was not declared. Should it be static? Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: fix spelling mistake: "delimitter" -> "delimiter"Colin Ian King2016-08-221-1/+1
| | | | | | | | | | | | | | | | | | | | | trivial fix to spelling mistake in pr_debug message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: add number generator expressionLaura Garcia Liebana2016-08-224-0/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the numgen expression that allows us to generated incremental and random numbers, this generator is bound to a upper limit that is specified by userspace. This expression is useful to distribute packets in a round-robin fashion as well as randomly. Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: add quota expressionPablo Neira Ayuso2016-08-224-0/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the quota expression. This new stateful expression integrate easily into the dynset expression to build 'hashquota' flow tables. Arguably, we could use instead "counter bytes > 1000" instead, but this approach has several problems: 1) We only support for one single stateful expression in dynamic set definitions, and the expression above is a composite of two expressions: get counter + comparison. 2) We would need to restore the packed counter representation (that we used to have) based on seqlock to synchronize this, since per-cpu is not suitable for this. So instead of bloating the counter expression back with the seqlock representation and extending the existing set infrastructure to make it more complex for the composite described above, let's follow the more simple approach of adding a quota expression that we can plug into our existing infrastructure. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_conntrack: restore nf_conntrack_htable_size as exported symbolPablo Neira Ayuso2016-08-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is required to iterate over the hash table in cttimeout, ctnetlink and nf_conntrack_ipv4. >> ERROR: "nf_conntrack_htable_size" [net/netfilter/nfnetlink_cttimeout.ko] undefined! ERROR: "nf_conntrack_htable_size" [net/netfilter/nf_conntrack_netlink.ko] undefined! ERROR: "nf_conntrack_htable_size" [net/ipv4/netfilter/nf_conntrack_ipv4.ko] undefined! Fixes: adf0516845bcd0 ("netfilter: remove ip_conntrack* sysctl compat code") Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: simplify the code by using nf_conntrack_get_htLiping Zhang2016-08-183-39/+30Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 64b87639c9cb ("netfilter: conntrack: fix race between nf_conntrack proc read and hash resize") introduce the nf_conntrack_get_ht, so there's no need to check nf_conntrack_generation again and again to get the hash table and hash size. And convert nf_conntrack_get_ht to inline function here. Suggested-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: remove ip_conntrack* sysctl compat codePablo Neira Ayuso2016-08-1313-1009/+7Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This backward compatibility has been around for more than ten years, since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and the conntrack utility got adopted by many people in the user community according to what I observed on the netfilter user mailing list. So let's get rid of this. Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do not need to be exported as symbol anymore. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: add hash expressionLaura Garcia Liebana2016-08-124-0/+163
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a new hash expression, this provides jhash support but this can be extended to support for other hash functions. The modulus and seed already comes embedded into this new expression. Use case example: ... meta mark set hash ip saddr mod 10 Signed-off-by: Laura Garcia Liebana <nevola@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables: rename set implementationsPablo Neira Ayuso2016-08-124-4/+4
| | | | | | | | | | | | | | | | | | | | | Use nft_set_* prefix for backend set implementations, thus we can use nft_hash for the new hash expression. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | ipvs: use nf_ct_kill helperFlorian Westphal2016-08-121-5/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | Once timer is removed from nf_conn struct we cannot open-code the removal sequence anymore. Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: use_nf_conn_expires helper in more placesFlorian Westphal2016-08-124-11/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | ... so we don't need to touch all of these places when we get rid of the timer in nf_conn. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_dup4: remove redundant checksum recalculationLiping Zhang2016-08-121-6/+4Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IP header checksum will be recalculated at ip_local_out, so there's no need to calculated it here, remove it. Also update code comments to illustrate it, and delete the misleading comments about checksum recalculation. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: physdev: add missed blankHangbin Liu2016-08-121-2/+2
| | | | | | | | | | | | | | | Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: conntrack: Only need first 4 bytes to get l4proto portsGao Feng2016-08-125-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | We only need first 4 bytes instead of 8 bytes to get the ports of tcp/udp/dccp/sctp/udplite in their pkt_to_tuple function. Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | net: ti: cpmac: Fix compiler warning due to type confusionPaul Burton2016-09-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpmac_start_xmit() used the max() macro on skb->len (an unsigned int) and ETH_ZLEN (a signed int literal). This led to the following compiler warning: In file included from include/linux/list.h:8:0, from include/linux/module.h:9, from drivers/net/ethernet/ti/cpmac.c:19: drivers/net/ethernet/ti/cpmac.c: In function 'cpmac_start_xmit': include/linux/kernel.h:748:17: warning: comparison of distinct pointer types lacks a cast (void) (&_max1 == &_max2); \ ^ drivers/net/ethernet/ti/cpmac.c:560:8: note: in expansion of macro 'max' len = max(skb->len, ETH_ZLEN); ^ On top of this, it assigned the result of the max() macro to a signed integer whilst all further uses of it result in it being cast to varying widths of unsigned integer. Fix this up by using max_t to ensure the comparison is performed as unsigned integers, and for consistency change the type of the len variable to unsigned int. Signed-off-by: Paul Burton <paul.burton@imgtec.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | cxgb4: Add support for ndo_get_vf_configHariprasad Shenai2016-09-043-2/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for ndo_get_vf_config, also fill the default mac address that will be provided to the VF by firmware, in case user doesn't provide one. So user can get the default MAC address address also through ndo_get_vf_config. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'netns-opt'David S. Miller2016-09-042-22/+17Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cong Wang says: ==================== net: some minor optimization for netns id Cong Wang (2): vxlan: call peernet2id() in fdb notification netns: avoid disabling irq for netns id ==================== Signed-off-by: David S. Miller <davem@davemloft.net>