summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'master' of ↵David S. Miller2011-03-0220-146/+143Star
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| * netfilter: nf_ct_tcp: fix out of sync scenario while in SYN_RECVPablo Neira Ayuso2011-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the out of sync scenarios while in SYN_RECV state. Quoting Jozsef, what it happens if we are out of sync if the following: > > b. conntrack entry is outdated, new SYN received > > - (b1) we ignore it but save the initialization data from it > > - (b2) when the reply SYN/ACK receives and it matches the saved data, > > we pick up the new connection This is what it should happen if we are in SYN_RECV state. Initially, the SYN packet hits b1, thus we save data from it. But the SYN/ACK packet is considered a retransmission given that we're in SYN_RECV state. Therefore, we never hit b2 and we don't get in sync. To fix this, we ignore SYN/ACK if we are in SYN_RECV. If the previous packet was a SYN, then we enter the ignore case that get us in sync. This patch helps a lot to conntrackd in stress scenarios (assumming a client that generates lots of small TCP connections). During the failover, consider that the new primary has injected one outdated flow in SYN_RECV state (this is likely to happen if the conntrack event rate is high because the backup will be a bit delayed from the primary). With the current code, if the client starts a new fresh connection that matches the tuple, the SYN packet will be ignored without updating the state tracking, and the SYN+ACK in reply will blocked as it will not pass checkings III or IV (since all state tracking in the original direction is not initialized because of the SYN packet was ignored and the ignore case that get us in sync is not applied). I posted a couple of patches before this one. Changli Gao spotted a simpler way to fix this problem. This patch implements his idea. Cc: Changli Gao <xiaosuo@gmail.com> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * ipvs: unify the formula to estimate the overhead of processing connectionsChangli Gao2011-02-255-63/+27Star
| | | | | | | | | | | | | | | | | | | | | | lc and wlc use the same formula, but lblc and lblcr use another one. There is no reason for using two different formulas for the lc variants. The formula used by lc is used by all the lc variants in this patch. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Wensong Zhang <wensong@linux-vs.org> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: use enum to instead of magic numbersChangli Gao2011-02-241-14/+27
| | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: use hlist instead of listChangli Gao2011-02-222-24/+30
| | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: make "no destination available" message more informativePatrick Schaaf2011-02-1611-14/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When IP_VS schedulers do not find a destination, they output a terse "WLC: no destination available" message through kernel syslog, which I can not only make sense of because syslog puts them in a logfile together with keepalived checker results. This patch makes the output a bit more informative, by telling you which virtual service failed to find a destination. Example output: kernel: [1539214.552233] IPVS: wlc: TCP 192.168.8.30:22 - no destination available kernel: [1539299.674418] IPVS: wlc: FWM 22 0x00000016 - no destination available I have tested the code for IPv4 and FWM services, as you can see from the example; I do not have an IPv6 setup to test the third code path with. To avoid code duplication, I put a new function ip_vs_scheduler_err() into ip_vs_sched.c, and use that from the schedulers instead of calling IP_VS_ERR_RL directly. Signed-off-by: Patrick Schaaf <netdev@bof.de> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: remove extra lookups for ICMP packetsJulian Anastasov2011-02-151-25/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | Remove code that should not be called anymore. Now when ip_vs_out handles replies for local clients at LOCAL_IN hook we do not need to call conn_out_get and handle_response_icmp from ip_vs_in_icmp* because such lookups were already performed for the ICMP packet and no connection was found. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * ipvs: fix timer in get_curr_sync_buffTinggong Wang2011-02-151-2/+2
| | | | | | | | | | | | | | | | | | | | Fix get_curr_sync_buff to keep buffer for 2 seconds as intended, not just for the current jiffie. By this way we will sync more connection structures with single packet. Signed-off-by: Tinggong Wang <wangtinggong@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
| * netfilter: nfnetlink_log: remove unused parameterFlorian Westphal2011-02-151-2/+1Star
| | | | | | | | | | Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: xt_conntrack: warn about use in raw tableJan Engelhardt2011-02-141-0/+5
| | | | | | | | | | | | | | nfct happens to run after the raw table only. Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * Revert "netfilter: xt_connlimit: connlimit-above early loop termination"Stefan Berger2011-02-141-10/+3Star
| | | | | | | | | | | | | | | | | | | | | | This reverts commit 44bd4de9c2270b22c3c898310102bc6be9ed2978. I have to revert the early loop termination in connlimit since it generates problems when an iptables statement does not use -m state --state NEW before the connlimit match extension. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * bridge: netfilter: fix information leakVasiliy Kulikov2011-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | | | Struct tmp is copied from userspace. It is not checked whether the "name" field is NULL terminated. This may lead to buffer overflow and passing contents of kernel stack as a module name to try_then_request_module() and, consequently, to modprobe commandline. It would be seen by all userspace processes. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: xt_connlimit: connlimit-above early loop terminationStefan Berger2011-02-111-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below introduces an early termination of the loop that is counting matches. It terminates once the counter has exceeded the threshold provided by the user. There's no point in continuing the loop afterwards and looking at other entries. It plays together with the following code further below: return (connections > info->limit) ^ info->inverse; where connections is the result of the counted connection, which in turn is the matches variable in the loop. So once -> matches = info->limit + 1 alias -> matches > info->limit alias -> matches > threshold we can terminate the loop. Signed-off-by: Stefan Berger <stefanb@linux.vnet.ibm.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: ipset: add dependency on CONFIG_NETFILTER_NETLINKPatrick McHardy2011-02-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When SYSCTL and PROC_FS and NETFILTER_NETLINK are not enabled: net/built-in.o: In function `try_to_load_type': ip_set_core.c:(.text+0x3ab49): undefined reference to `nfnl_unlock' ip_set_core.c:(.text+0x3ab4e): undefined reference to `nfnl_lock' ... Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | inet: Replace left-over references to inet->corkHerbert Xu2011-03-021-2/+2
| | | | | | | | | | | | | | | | The patch to replace inet->cork with cork left out two spots in __ip_append_data that can result in bogus packet construction. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | pfkey: fix warningStephen Hemminger2011-03-021-1/+1
| | | | | | | | | | | | | | | | If CONFIG_NET_KEY_MIGRATE is not defined the arguments of pfkey_migrate stub do not match causing warning. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Make icmp route lookup code a bit clearer.David S. Miller2011-03-021-51/+66
| | | | | | | | | | | | | | | | | | | | | | The route lookup code in icmpv6_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Make icmp route lookup code a bit clearer.David S. Miller2011-03-021-79/+96
| | | | | | | | | | | | | | | | | | | | | | The route lookup code in icmp_send() is slightly tricky as a result of having to handle all of the requirements of RFC 4301 host relookups. Pull the route resolution into a seperate function, so that the error handling and route reference counting is hopefully easier to see and contained wholly within this new routine. Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: Handle blackhole route creation via afinfo.David S. Miller2011-03-0110-67/+50Star
| | | | | | | | | | | | | | That way we don't have to potentially do this in every xfrm_lookup() caller. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Normalize arguments to ip6_dst_blackhole().David S. Miller2011-03-013-12/+9Star
| | | | | | | | | | | | | | | | | | | | Return a dst pointer which is potentitally error encoded. Don't pass original dst pointer by reference, pass a struct net instead of a socket, and elide the flow argument since it is unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
* | xfrm: Kill XFRM_LOOKUP_WAIT flag.David S. Miller2011-03-015-10/+8Star
| | | | | | | | | | | | This can be determined from the flow flags instead. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Change final dst lookup arg name to "can_sleep"David S. Miller2011-03-012-8/+8
| | | | | | | | | | | | | | Since it indicates whether we are invoked from a sleepable context or not. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Kill can_sleep arg to ip_route_output_flow()David S. Miller2011-03-0112-16/+17
| | | | | | | | | | | | This boolean state is now available in the flow flags. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add FLOWI_FLAG_CAN_SLEEP.David S. Miller2011-03-015-3/+11
| | | | | | | | | | | | And set is in contexts where the route resolution can sleep. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Make final arg to ip_route_output_flow to be boolean "can_sleep"David S. Miller2011-03-0112-16/+16
| | | | | | | | | | | | Since that is what the current vague "flags" argument means. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Can final ip_route_connect() arg to boolean "can_sleep".David S. Miller2011-03-016-7/+7
| | | | | | | | | | | | Since that's what the current vague "flags" thing means. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Consolidate route lookup sequences.David S. Miller2011-03-0110-165/+142Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Route lookups follow a general pattern in the ipv6 code wherein we first find the non-IPSEC route, potentially override the flow destination address due to ipv6 options settings, and then finally make an IPSEC search using either xfrm_lookup() or __xfrm_lookup(). __xfrm_lookup() is used when we want to generate a blackhole route if the key manager needs to resolve the IPSEC rules (in this case -EREMOTE is returned and the original 'dst' is left unchanged). Otherwise plain xfrm_lookup() is used and when asynchronous IPSEC resolution is necessary, we simply fail the lookup completely. All of these cases are encapsulated into two routines, ip6_dst_lookup_flow and ip6_sk_dst_lookup_flow. The latter of which handles unconnected UDP datagram sockets. Signed-off-by: David S. Miller <davem@davemloft.net>
* | udp: Add lockless transmit pathHerbert Xu2011-03-011-1/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UDP transmit path has been running under the socket lock for a long time because of the corking feature. This means that transmitting to the same socket in multiple threads does not scale at all. However, as most users don't actually use corking, the locking can be removed in the common case. This patch creates a lockless fast path where corking is not used. Please note that this does create a slight inaccuracy in the enforcement of socket send buffer limits. In particular, we may exceed the socket limit by up to (number of CPUs) * (packet size) because of the way the limit is computed. As the primary purpose of socket buffers is to indicate congestion, this should not be a great problem for now. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | udp: Switch to ip_finish_skbHerbert Xu2011-03-013-33/+73
| | | | | | | | | | | | | | | | | | | | This patch converts UDP to use the new ip_finish_skb API. This would then allows us to more easily use ip_make_skb which allows UDP to run without a socket lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Add ip_make_skb and ip_finish_skbHerbert Xu2011-03-012-14/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the helper ip_make_skb which is like ip_append_data and ip_push_pending_frames all rolled into one, except that it does not send the skb produced. The sending part is carried out by ip_send_skb, which the transport protocol can call after it has tweaked the skb. It is meant to be called in cases where corking is not used should have a one-to-one correspondence to sendmsg. This patch also adds the helper ip_finish_skb which is meant to be replace ip_push_pending_frames when corking is required. Previously the protocol stack would peek at the socket write queue and add its header to the first packet. With ip_finish_skb, the protocol stack can directly operate on the final skb instead, just like the non-corking case with ip_make_skb. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Remove explicit write references to sk/inet in ip_append_dataHerbert Xu2011-03-012-107/+154
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to allow simultaneous calls to ip_append_data on the same socket, it must not modify any shared state in sk or inet (other than those that are designed to allow that such as atomic counters). This patch abstracts out write references to sk and inet_sk in ip_append_data and its friends so that we may use the underlying code in parallel. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Remove unused sk_sndmsg_* from UFOHerbert Xu2011-03-013-5/+0Star
| | | | | | | | | | | | | | | | | | | | UFO doesn't really use the sk_sndmsg_* parameters so touching them is pointless. It can't use them anyway since the whole point of UFO is to use the original pages without copying. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'for-davem' of ↵David S. Miller2011-03-0130-220/+150Star
|\ \ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/bwh/sfc-next-2.6
| * | sfc: Bump version to 3.1Ben Hutchings2011-03-011-1/+1
| | | | | | | | | | | | | | | | | | | | | All features originally planned for version 3.1 (and some that weren't) have been implemented. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Remove configurable FIFO thresholds for pause frame generationBen Hutchings2011-03-014-41/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In Falcon we can configure the fill levels of the RX data FIFO which trigger the generation of pause frames (if enabled), and we have module parameters for this. Siena does not allow the levels to be configured (or, if it does, this is done by the MC firmware and is not configurable by drivers). So far as I can tell, the module parameters are not used by our internal scripts and have not been documented (with the exception of the short parameter descriptions). Therefore, remove them and always initialise Falcon with the default values. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Expose TX push and TSO counters through ethtool statisticsBen Hutchings2011-03-011-1/+20
| | | | | | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Update copyright datesBen Hutchings2011-03-0130-30/+30
| | | | | | | | | | | | Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Do not read STAT1.FAULT in efx_mdio_check_mmd()Ben Hutchings2011-03-014-31/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field does not exist in all MMDs we want to check, and all callers allow it to be set (fault_fatal = 0). Remove the loopback condition, as STAT2.DEVPRST should be valid regardless of any fault. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Read MC firmware version when requested through ethtoolBen Hutchings2011-03-015-41/+9Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently make no use of siena_nic_data::fw_{version,build} except to format the firmware version for ethtool_get_drvinfo(). Since we only read the version at start of day, this information is incorrect after an MC firmware update. Remove the cached version information and read it via MCDI whenever it is requested. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Reduce size of efx_rx_buffer further by removing data memberSteve Hodgson2011-03-012-24/+28
| | | | | | | | | | | | | | | | | | | | | Instead calculate the KVA of receive data. It's not like it's a hard sum. [bwh: Fixed to work with GRO.] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
| * | sfc: Reduce size of efx_rx_buffer by unionising skb and pageSteve Hodgson2011-03-012-53/+51Star
| | | | | | | | | | | | | | | [bwh: Forward-ported to net-next-2.6.] Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
* | | bonding: use the correct size for _simple_hash()Amerigo Wang2011-02-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Clearly it should be the size of ->ip_dst here. Although this is harmless, but it still reads odd. Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | enic: Flush driver cache of registered addr lists during port profile ↵Roopa Prabhu2011-02-282-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disassociate During a port profile disassociate all address registrations for the interface are blown away from the adapter. This patch resets the driver cache of registered address lists to zero after a port profile disassociate. Signed-off-by: Roopa Prabhu <roprabhu@cisco.com> Signed-off-by: David Wang <dwang2@cisco.com> Signed-off-by: Christian Benvenuti <benve@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | DM9000: Allow randomised ethernet addressBen Dooks2011-02-281-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Allow randomised ethernet address if the device does not have a valid EEPROM or pre-set MAC address. Signed-off-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | qla3xxx: add missing __iomem annotationstephen hemminger2011-02-281-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add necessary annotations about pointer to io memory space that is checked by sparse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | bonding: fix sparse warningstephen hemminger2011-02-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | Fix use of zero where NULL expected. And wrap long line. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: TX timestamps for IPv6 UDP packetsAnders Berggren2011-02-281-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Enabling TX timestamps (SO_TIMESTAMPING) for IPv6 UDP packets, in the same fashion as for IPv4. Necessary in order for NICs such as Intel 82580 to timestamp IPv6 packets. Signed-off-by: Anders Berggren <anders@halon.se> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sis900: use pci_dev->revisionSergei Shtylyov2011-02-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | This driver uses PCI_CLASS_REVISION instead of PCI_REVISION_ID, so it wasn't converted by commit 44c10138fd4bbc4b6d6bff0873c24902f2a9da65 (PCI: Change all drivers to use pci_device->revision). Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | llc: avoid skb_clone() if there is only one handlerChangli Gao2011-02-281-12/+13
| | | | | | | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | bnx2x: use dcb_setapp to manage negotiated application tlvsShmulik Ravid2011-02-284-59/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this patch the bnx2x uses the generic dcbnl application tlv list instead of implementing its own get-app handler. When the driver is alerted to a change in the DCB negotiated parameters, it calls dcb_setapp to update the dcbnl application tlvs list making it available to user mode applications and registered notifiers. Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>