summaryrefslogtreecommitdiffstats
path: root/net/netfilter/nf_conntrack_core.c
Commit message (Collapse)AuthorAgeFilesLines
* netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pagesEric Dumazet2010-10-281-1/+2
| | | | | | | | | | | commit ea781f197d6a8 (use SLAB_DESTROY_BY_RCU and get rid of call_rcu()) did a mistake in __vmalloc() call in nf_ct_alloc_hashtable(). I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: save the hash of the tuple in the original direction for latter useChangli Gao2010-09-211-34/+78
| | | | | | | | | | | | | | Since we don't change the tuple in the original direction, we can save it in ct->tuplehash[IP_CT_DIR_REPLY].hnode.pprev for __nf_conntrack_confirm() use. __hash_conntrack() is split into two steps: hash_conntrack_raw() is used to get the raw hash, and __hash_bucket() is used to get the bucket id. In SYN-flood case, early_drop() doesn't need to recompute the hash again. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: fix the hash random initializing raceChangli Gao2010-09-161-6/+13
| | | | | | | | | nf_conntrack_alloc() isn't called with nf_conntrack_lock locked, so hash random initializing code maybe executed more than once on different CPUs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack_acct: use skb->len for accountingChangli Gao2010-08-021-2/+1Star
| | | | | | | | use skb->len for accounting as xt_quota does. Since nf_conntrack works at the network layer, skb_network_offset should always returns ZERO. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy2010-06-151-4/+1Star
|\ | | | | | | | | | | | | | | | | Conflicts: include/net/netfilter/xt_rateest.h net/bridge/br_netfilter.c net/netfilter/nf_conntrack_core.c Signed-off-by: Patrick McHardy <kaber@trash.net>
| * net: CONFIG_NET_NS reductionEric Dumazet2010-06-021-6/+2Star
| | | | | | | | | | | | | | Use read_pnet() and write_pnet() to reduce number of ifdef CONFIG_NET_NS Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: nf_conntrack: per_cpu untrackingEric Dumazet2010-06-091-10/+26
| | | | | | | | | | | | | | | | | | | | NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet, slowing down performance. This patch converts it to a per_cpu variable. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netfilter: nf_conntrack: IPS_UNTRACKED bitEric Dumazet2010-06-081-3/+8
|/ | | | | | | | | | | | | | | | | | | | NOTRACK makes all cpus share a cache line on nf_conntrack_untracked twice per packet. This is bad for performance. __read_mostly annotation is also a bad choice. This patch introduces IPS_UNTRACKED bit so that we can use later a per_cpu untrack structure more easily. A new helper, nf_ct_untracked_get() returns a pointer to nf_conntrack_untracked. Another one, nf_ct_untracked_status_or() is used by nf_nat_init() to add IPS_NAT_DONE_MASK bits to untracked status. nf_ct_is_untracked() prototype is changed to work on a nf_conn pointer. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against ↵Joerg Marx2010-05-201-0/+10
| | | | | | | | | | | | | nf_ct_get_next_corpse() This race was triggered by a 'conntrack -F' command running in parallel to the insertion of a hash for a new connection. Losing this race led to a dead conntrack entry effectively blocking traffic for a particular connection until timeout or flushing the conntrack hashes again. Now the check for an already dying connection is done inside the lock. Signed-off-by: Joerg Marx <joerg.marx@secunet.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: cleanup printk messagesStephen Hemminger2010-05-131-1/+1
| | | | | | | Make sure all printk messages have a severity level. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: extend with extra stat counterJesper Dangaard Brouer2010-04-231-1/+3
| | | | | | | | | | I suspect an unfortunatly series of events occuring under a DDoS attack, in function __nf_conntrack_find() nf_contrack_core.c. Adding a stats counter to see if the search is restarted too often. Signed-off-by: Jesper Dangaard Brouer <hawk@comx.dk> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: add support for "conntrack zones"Patrick McHardy2010-02-151-27/+82
| | | | | | | | | | | | | Normally, each connection needs a unique identity. Conntrack zones allow to specify a numerical zone using the CT target, connections in different zones can use the same identity. Example: iptables -t raw -A PREROUTING -i veth0 -j CT --zone 1 iptables -t raw -A OUTPUT -o veth1 -j CT --zone 1 Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: pass template to l4proto ->error() handlerPatrick McHardy2010-02-151-1/+2
| | | | | | | The error handlers might need the template to get the conntrack zone introduced in the next patches to perform a conntrack lookup. Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of /repos/git/net-next-2.6Patrick McHardy2010-02-101-53/+63
|\ | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: fix hash resizing with namespacesPatrick McHardy2010-02-081-25/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: nf_conntrack: per netns nf_conntrack_cachepEric Dumazet2010-02-081-16/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also *must* use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: fix memory corruption with multiple namespacesPatrick McHardy2010-02-081-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discovered by Jon Masters <jonathan@jonmasters.org>, the "untracked" conntrack, which is located in the data section, might be accidentally freed when a new namespace is instantiated while the untracked conntrack is attached to a skb because the reference count it re-initialized. The best fix would be to use a seperate untracked conntrack per namespace since it includes a namespace pointer. Unfortunately this is not possible without larger changes since the namespace is not easily available everywhere we need it. For now move the untracked conntrack initialization to the init_net setup function to make sure the reference count is not re-initialized and handle cleanup in the init_net cleanup function to make sure namespaces can exit properly while the untracked conntrack is in use in other namespaces. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: nf_conntrack: support conntrack templatesPatrick McHardy2010-02-031-16/+34
| | | | | | | | | | | | | | | | | | | | | | Support initializing selected parameters of new conntrack entries from a "conntrack template", which is a specially marked conntrack entry attached to the skb. Currently the helper and the event delivery masks can be initialized this way. Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netfilter: ctnetlink: support selective event deliveryPatrick McHardy2010-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | Add two masks for conntrack end expectation events to struct nf_conntrack_ecache and use them to filter events. Their default value is "all events" when the event sysctl is on and "no events" when it is off. A following patch will add specific initializations. Expectation events depend on the ecache struct of their master conntrack. Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netfilter: nf_conntrack: split up IPCT_STATUS eventPatrick McHardy2010-02-031-1/+1
|/ | | | | | | | | | | | | Split up the IPCT_STATUS event into an IPCT_REPLY event, which is generated when the IPS_SEEN_REPLY bit is set, and an IPCT_ASSURED event, which is generated when the IPS_ASSURED bit is set. In combination with a following patch to support selective event delivery, this can be used for "sparse" conntrack replication: start replicating the conntrack entry after it reached the ASSURED state and that way it's SYN-flood resistant. Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of ↵David S. Miller2009-12-031-4/+10
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netfilter: nf_conntrack: avoid additional compare.Changli Gao2009-11-051-4/+10
| | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-11-091-0/+8
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits) net/fsl_pq_mdio: add module license GPL can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo() can: should not use __dev_get_by_index() without locks hisax: remove bad udelay call to fix build error on ARM ipip: Fix handling of DF packets when pmtudisc is OFF qlge: Set PCIe reset type for EEH to fundamental. qlge: Fix early exit from mbox cmd complete wait. ixgbe: fix traffic hangs on Tx with ioatdma loaded ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled ixgbe: Fix gso_max_size for 82599 when DCB is enabled macsonic: fix crash on PowerBook 520 NET: cassini, fix lock imbalance ems_usb: Fix byte order issues on big endian machines be2net: Bug fix to send config commands to hardware after netdev_register be2net: fix to set proper flow control on resume netfilter: xt_connlimit: fix regression caused by zero family value rt2x00: Don't queue ieee80211 work after USB removal Revert "ipw2200: fix oops on missing firmware" decnet: netdevice refcount leak netfilter: nf_nat: fix NAT issue in 2.6.30.4+ ...
| * | netfilter: nf_nat: fix NAT issue in 2.6.30.4+Jozsef Kadlecsik2009-11-061-0/+8
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vitezslav Samel discovered that since 2.6.30.4+ active FTP can not work over NAT. The "cause" of the problem was a fix of unacknowledged data detection with NAT (commit a3a9f79e361e864f0e9d75ebe2a0cb43d17c4272). However, actually, that fix uncovered a long standing bug in TCP conntrack: when NAT was enabled, we simply updated the max of the right edge of the segments we have seen (td_end), by the offset NAT produced with changing IP/port in the data. However, we did not update the other parameter (td_maxend) which is affected by the NAT offset. Thus that could drift away from the correct value and thus resulted breaking active FTP. The patch below fixes the issue by *not* updating the conntrack parameters from NAT, but instead taking into account the NAT offsets in conntrack in a consistent way. (Updating from NAT would be more harder and expensive because it'd need to re-calculate parameters we already calculated in conntrack.) Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* / headers: remove sched.h from interrupt.hAlexey Dobriyan2009-10-111-0/+1
|/ | | | | | | | After m68k's task_thread_info() doesn't refer to current, it's possible to remove sched.h from interrupt.h and not break m68k! Many thanks to Heiko Carstens for allowing this. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* mm: replace various uses of num_physpages by totalram_pagesJan Beulich2009-09-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* netfilter: nf_conntrack: netns fix re reliable conntrack event deliveryAlexey Dobriyan2009-08-311-3/+3
| | | | | | | | Conntracks in netns other than init_net dying list were never killed. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nfnetlink: constify message attributes and headersPatrick McHardy2009-08-251-1/+1
| | | | Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: nf_conntrack_alloc() fixesEric Dumazet2009-07-161-3/+18
| | | | | | | | | | | | | | | | | | | | | | When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating objects, since slab allocator could give a freed object still used by lockless readers. In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next being always valid (ie containing a valid 'nulls' value, or a valid pointer to next object in hash chain.) kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid for ct->tuplehash[xxx].hnnode.next. Fix is to call kmem_cache_alloc() and do the zeroing ourself. As spotted by Patrick, we also need to make sure lookup keys are committed to memory before setting refcount to 1, or a lockless reader could get a reference on the old version of the object. Its key re-check could then pass the barrier. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: fix conntrack lookup racePatrick McHardy2009-06-221-2/+4
| | | | | | | | | | The RCU protected conntrack hash lookup only checks whether the entry has a refcount of zero to decide whether it is stale. This is not sufficient, entries are explicitly removed while there is at least one reference left, possibly more. Explicitly check whether the entry has been marked as dying to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: fix confirmation race conditionPatrick McHardy2009-06-221-1/+8
| | | | | | | | | | | | New connection tracking entries are inserted into the hash before they are fully set up, namely the CONFIRMED bit is not set and the timer not started yet. This can theoretically lead to a race with timer, which would set the timeout value to a relative value, most likely already in the past. Perform hash insertion as the final step to fix this. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: death_by_timeout() fixEric Dumazet2009-06-221-2/+8
| | | | | | | | | | | | | | | | | | | | | | death_by_timeout() might delete a conntrack from hash list and insert it in dying list. nf_ct_delete_from_lists(ct); nf_ct_insert_dying_list(ct); I believe a (lockless) reader could *catch* ct while doing a lookup and miss the end of its chain. (nulls lookup algo must check the null value at the end of lookup and should restart if the null value is not the expected one. cf Documentation/RCU/rculist_nulls.txt for details) We need to change nf_conntrack_init_net() and use a different "null" value, guaranteed not being used in regular lists. Choose very large values, since hash table uses [0..size-1] null values. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: conntrack: optional reliable conntrack event deliveryPablo Neira Ayuso2009-06-131-11/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()Pablo Neira Ayuso2009-06-131-10/+1Star
| | | | | | | | | This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: conntrack: move event caching to conntrack extension infrastructurePablo Neira Ayuso2009-06-131-7/+8
| | | | | | | | | | | | | | | | This patch reworks the per-cpu event caching to use the conntrack extension infrastructure. The main drawback is that we consume more memory per conntrack if event delivery is enabled. This patch is required by the reliable event delivery that follows to this patch. BTW, this patch allows you to enable/disable event delivery via /proc/sys/net/netfilter/nf_conntrack_events in runtime, although you can still disable event caching as compilation option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: use mod_timer_pending() for conntrack refreshPatrick McHardy2009-06-131-11/+6Star
| | | | | | | | | | | | | Use mod_timer_pending() instead of atomic sequence of del_timer()/ add_timer(). mod_timer_pending() does not rearm an inactive timer, so we don't need the conntrack lock anymore to make sure we don't accidentally rearm a timer of a conntrack which is in the process of being destroyed. With this change, we don't need to take the global lock anymore at all, counter updates can be performed under the per-conntrack lock. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: nf_conntrack: use per-conntrack locks for protocol dataPatrick McHardy2009-06-101-0/+1
| | | | | | | | | | Introduce per-conntrack locks and use them instead of the global protocol locks to avoid contention. Especially tcp_lock shows up very high in profiles on larger machines. This will also allow to simplify the upcoming reliable event delivery patches. Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter: conntrack: simplify event caching systemPablo Neira Ayuso2009-06-021-13/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* netfilter: conntrack: don't report events on module removalPablo Neira Ayuso2009-06-021-5/+10
| | | | | | | | | During the module removal there are no possible event listeners since ctnetlink must be removed before to allow removing nf_conntrack. This patch removes the event reporting for the module removal case which is not of any use in the existing code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* Merge branch 'master' of ↵David S. Miller2009-03-271-53/+76
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netfilter: nf_conntrack: add generic function to get len of generic policyHolger Eitzenberger2009-03-251-0/+6
| | | | | | | | | | | | | | | | Usefull for all protocols which do not add additional data, such as GRE or UDPlite. Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and get rid of call_rcu()Eric Dumazet2009-03-251-53/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use "hlist_nulls" infrastructure we added in 2.6.29 for RCUification of UDP & TCP. This permits an easy conversion from call_rcu() based hash lists to a SLAB_DESTROY_BY_RCU one. Avoiding call_rcu() delay at nf_conn freeing time has numerous gains. First, it doesnt fill RCU queues (up to 10000 elements per cpu). This reduces OOM possibility, if queued elements are not taken into account This reduces latency problems when RCU queue size hits hilimit and triggers emergency mode. - It allows fast reuse of just freed elements, permitting better use of CPU cache. - We delete rcu_head from "struct nf_conn", shrinking size of this structure by 8 or 16 bytes. This patch only takes care of "struct nf_conn". call_rcu() is still used for less critical conntrack parts, that may be converted later if necessary. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: use hlist_add_head_rcu() in nf_conntrack_set_hashsize()Eric Dumazet2009-03-251-1/+1
| | | | | | | | | | | | | | | | | | | | Using hlist_add_head() in nf_conntrack_set_hashsize() is quite dangerous. Without any barrier, one CPU could see a loop while doing its lookup. Its true new table cannot be seen by another cpu, but previous table is still readable. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | Merge branch 'master' of ↵David S. Miller2009-03-241-5/+9
|\| | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netfilter: nf_conntrack: Reduce conntrack count in nf_conntrack_free()Eric Dumazet2009-03-241-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | We use RCU to defer freeing of conntrack structures. In DOS situation, RCU might accumulate about 10.000 elements per CPU in its internal queues. To get accurate conntrack counts (at the expense of slightly more RAM used), we might consider conntrack counter not taking into account "about to be freed elements, waiting in RCU queues". We thus decrement it in nf_conntrack_free(), not in the RCU callback. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Tested-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: account packets drop by tcp_packet()Pablo Neira Ayuso2009-02-241-0/+2
| | | | | | | | | | | | | | | | Since tcp_packet() may return -NF_DROP in two situations, the packet-drop stats must be increased. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: fix hardcoded size assumptionsHagen Paul Pfeifer2009-02-201-2/+3
| | | | | | | | | | | | | | | | | | get_random_bytes() is sometimes called with a hard coded size assumption of an integer. This could not be true for next centuries. This patch replace it with a compile time statement. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: nf_conntrack: table max size should hold at least table sizeHagen Paul Pfeifer2009-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | Table size is defined as unsigned, wheres the table maximum size is defined as a signed integer. The calculation of max is 8 or 4, multiplied the table size. Therefore the max value is aligned to unsigned. Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | netfilter: conntrack: fix dropping packet after l4proto->packet()Christoph Paasch2009-03-161-1/+1
|/ | | | | | | | | | We currently use the negative value in the conntrack code to encode the packet verdict in the error. As NF_DROP is equal to 0, inverting NF_DROP makes no sense and, as a result, no packets are ever dropped. Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* netfilter 07/09: simplify nf_conntrack_alloc() error handlingJulia Lawall2009-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nf_conntrack_alloc cannot return NULL, so there is no need to check for NULL before using the value. I have also removed the initialization of ct to NULL in nf_conntrack_alloc, since the value is never used, and since perhaps it might lead one to think that return ct at the end might return NULL. The semantic patch that finds this problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @match exists@ expression x, E; position p1,p2; statement S1, S2; @@ x@p1 = nf_conntrack_alloc(...) ... when != x = E ( if (x@p2 == NULL || ...) S1 else S2 | if (x@p2 == NULL && ...) S1 else S2 ) @other_match exists@ expression match.x, E1, E2; position p1!=match.p1,match.p2; @@ x@p1 = E1 ... when != x = E2 x@p2 @ script:python depends on !other_match@ p1 << match.p1; p2 << match.p2; @@ print "%s: call to nf_conntrack_alloc %s bad test %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>