summaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp.c
Commit message (Collapse)AuthorAgeFilesLines
* [NET]: Verify gso_type too in gso_segmentHerbert Xu2006-07-041-1/+12
| | | | | | | | We don't want nasty Xen guests to pass a TCPv6 packet in with gso_type set to TCPv4 or even UDP (or a packet that's both TCP and UDP). Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-07-011-5/+12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [IPV6]: Added GSO support for TCPv6 [NET]: Generalise TSO-specific bits from skb_setup_caps [IPV6]: Added GSO support for TCPv6 [IPV6]: Remove redundant length check on input [NETFILTER]: SCTP conntrack: fix crash triggered by packet without chunks [TG3]: Update version and reldate [TG3]: Add TSO workaround using GSO [TG3]: Turn on hw fix for ASF problems [TG3]: Add rx BD workaround [TG3]: Add tg3_netif_stop() in vlan functions [TCP]: Reset gso_segs if packet is dodgy
| * [NET]: Generalise TSO-specific bits from skb_setup_capsHerbert Xu2006-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch generalises the TSO-specific bits from sk_setup_caps by adding the sk_gso_type member to struct sock. This makes sk_setup_caps generic so that it can be used by TCPv6 or UFO. The only catch is that whoever uses this must provide a GSO implementation for their protocol which I think is a fair deal :) For now UFO continues to live without a GSO implementation which is OK since it doesn't use the sock caps field at the moment. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [IPV6]: Added GSO support for TCPv6Herbert Xu2006-06-301-0/+1
| | | | | | | | | | | | | | This patch adds GSO support for IPv6 and TCPv6. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TCP]: Reset gso_segs if packet is dodgyHerbert Xu2006-06-301-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | I wasn't paranoid enough in verifying GSO information. A bogus gso_segs could upset drivers as much as a bogus header would. Let's reset it in the per-protocol gso_segment functions. I didn't verify gso_size because that can be verified by the source of the dodgy packets. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Remove obsolete #include <linux/config.h>Jörn Engel2006-06-301-1/+0Star
|/ | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
* [NET]: Added GSO header verificationHerbert Xu2006-06-301-2/+6
| | | | | | | | | | | | | | | | | | | | | | When GSO packets come from an untrusted source (e.g., a Xen guest domain), we need to verify the header integrity before passing it to the hardware. Since the first step in GSO is to verify the header, we can reuse that code by adding a new bit to gso_type: SKB_GSO_DODGY. Packets with this bit set can only be fed directly to devices with the corresponding bit NETIF_F_GSO_ROBUST. If the device doesn't have that bit, then the skb is fed to the GSO engine which will allow the packet to be sent to the hardware if it passes the header check. This patch changes the sg flag to a full features flag. The same method can be used to implement TSO ECN support. We simply have to mark packets with CWR set with SKB_GSO_ECN so that only hardware with a corresponding NETIF_F_TSO_ECN can accept them. The GSO engine can either fully segment the packet, or segment the first MTU and pass the rest to the hardware for further segmentation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix CHECKSUM_HW GSO problems.Herbert Xu2006-06-261-11/+11
| | | | | | | | | | | | | | | | | | | Fix checksum problems in the GSO code path for CHECKSUM_HW packets. The ipv4 TCP pseudo header checksum has to be adjusted for GSO segmented packets. The adjustment is needed because the length field in the pseudo-header changes. However, because we have the inequality oldlen > newlen, we know that delta = (u16)~oldlen + newlen is still a 16-bit quantity. This also means that htonl(delta) + th->check still fits in 32 bits. Therefore we don't have to use csum_add on this operations. This is based on a patch by Michael Chan <mchan@broadcom.com>. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add software TSOv4Herbert Xu2006-06-231-0/+62
| | | | | | | This patch adds the GSO implementation for IPv4 TCP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Merge TSO/UFO fields in sk_buffHerbert Xu2006-06-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Having separate fields in sk_buff for TSO/UFO (tso_size/ufo_size) is not going to scale if we add any more segmentation methods (e.g., DCCP). So let's merge them. They were used to tell the protocol of a packet. This function has been subsumed by the new gso_type field. This is essentially a set of netdev feature bits (shifted by 16 bits) that are required to process a specific skb. As such it's easy to tell whether a given device can process a GSO skb: you just have to and the gso_type field and the netdev's features field. I've made gso_type a conjunction. The idea is that you have a base type (e.g., SKB_GSO_TCPV4) that can be modified further to support new features. For example, if we add a hardware TSO type that supports ECN, they would declare NETIF_F_TSO | NETIF_F_TSO_ECN. All TSO packets with CWR set would have a gso_type of SKB_GSO_TCPV4 | SKB_GSO_TCPV4_ECN while all other TSO packets would be SKB_GSO_TCPV4. This means that only the CWR packets need to be emulated in software. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add NETIF_F_GEN_CSUM and NETIF_F_ALL_CSUMHerbert Xu2006-06-181-8/+2Star
| | | | | | | | | | | The current stack treats NETIF_F_HW_CSUM and NETIF_F_NO_CSUM identically so we test for them in quite a few places. For the sake of brevity, I'm adding the macro NETIF_F_GEN_CSUM for these two. We also test the disjunct of NETIF_F_IP_CSUM and the other two in various places, for that purpose I've added NETIF_F_ALL_CSUM. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [I/OAT]: TCP recv offload to I/OATChris Leech2006-06-181-12/+91
| | | | | | | | Locks down user pages and sets up for DMA in tcp_recvmsg, then calls dma_async_try_early_copy in tcp_v4_do_rcv Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [I/OAT]: Make sk_eat_skb I/OAT aware.Chris Leech2006-06-181-4/+4
| | | | | | | | Add an extra argument to sk_eat_skb, and make it move early copied packets to the async_wait_queue instead of freeing them. Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [I/OAT]: Rename cleanup_rbuf to tcp_cleanup_rbuf and make non-staticChris Leech2006-06-181-5/+5
| | | | | | | Needed to be able to call tcp_cleanup_rbuf in tcp_input.c for I/OAT Signed-off-by: Chris Leech <christopher.leech@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix sock_orphan dead lockHerbert Xu2006-05-041-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling sock_orphan inside bh_lock_sock in tcp_close can lead to dead locks. For example, the inet_diag code holds sk_callback_lock without disabling BH. If an inbound packet arrives during that admittedly tiny window, it will cause a dead lock on bh_lock_sock. Another possible path would be through sock_wfree if the network device driver frees the tx skb in process context with BH enabled. We can fix this by moving sock_orphan out of bh_lock_sock. The tricky bit is to work out when we need to destroy the socket ourselves and when it has already been destroyed by someone else. By moving sock_orphan before the release_sock we can solve this problem. This is because as long as we own the socket lock its state cannot change. So we simply record the socket state before the release_sock and then check the state again after we regain the socket lock. If the socket state has transitioned to TCP_CLOSE in the time being, we know that the socket has been destroyed. Otherwise the socket is still ours to keep. Note that I've also moved the increment on the orphan count forward. This may look like a problem as we're increasing it even if the socket is just about to be destroyed where it'll be decreased again. However, this simply enlarges a window that already exists. This also changes the orphan count test by one. Considering what the orphan count is meant to do this is no big deal. This problem was discoverd by Ingo Molnar using his lock validator. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2006-03-251-10/+16
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NETFILTER] x_table.c: sem2mutex [IPV4]: Aggregate route entries with different TOS values [TCP]: Mark tcp_*mem[] __read_mostly. [TCP]: Set default max buffers from memory pool size [SCTP]: Fix up sctp_rcv return value [NET]: Take RTNL when unregistering notifier [WIRELESS]: Fix config dependencies. [NET]: Fill in a 32-bit hole in struct sock on 64-bit platforms. [NET]: Ensure device name passed to SO_BINDTODEVICE is NULL terminated. [MODULES]: Don't allow statically declared exports [BRIDGE]: Unaligned accesses in the ethernet bridge
| * [TCP]: Mark tcp_*mem[] __read_mostly.David S. Miller2006-03-251-3/+4
| | | | | | | | | | | | Suggested by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TCP]: Set default max buffers from memory pool sizeJohn Heffner2006-03-251-9/+14
| | | | | | | | | | | | | | | | | | | | This patch sets the maximum TCP buffer sizes (available to automatic buffer tuning, not to setsockopt) based on the TCP memory pool size. The maximum sndbuf and rcvbuf each will be up to 4 MB, but no more than 1/128 of the memory pressure threshold. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] POLLRDHUP/EPOLLRDHUP handling for half-closed devices notificationsDavide Libenzi2006-03-251-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the half-closed devices notifiation, by adding a new POLLRDHUP (and its alias EPOLLRDHUP) bit to the existing poll/select sets. Since the existing POLLHUP handling, that does not report correctly half-closed devices, was feared to be changed, this implementation leaves the current POLLHUP reporting unchanged and simply add a new bit that is set in the few places where it makes sense. The same thing was discussed and conceptually agreed quite some time ago: http://lkml.org/lkml/2003/7/12/116 Since this new event bit is added to the existing Linux poll infrastruture, even the existing poll/select system calls will be able to use it. As far as the existing POLLHUP handling, the patch leaves it as is. The pollrdhup-2.6.16.rc5-0.10.diff defines the POLLRDHUP for all the existing archs and sets the bit in the six relevant files. The other attached diff is the simple change required to sys/epoll.h to add the EPOLLRDHUP definition. There is "a stupid program" to test POLLRDHUP delivery here: http://www.xmailserver.org/pollrdhup-test.c It tests poll(2), but since the delivery is same epoll(2) will work equally. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NET]: Identation & other cleanups related to compat_[gs]etsockopt csetArnaldo Carvalho de Melo2006-03-211-8/+8
| | | | | | | | No code changes, just tidying up, in some cases moving EXPORT_SYMBOLs to just after the function exported, etc. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK] compat: Introduce inet_csk_compat_[gs]etsockoptArnaldo Carvalho de Melo2006-03-211-20/+6Star
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: {get|set}sockopt compatibility layerDmitry Mishin2006-03-211-12/+65
| | | | | | | | | This patch extends {get|set}sockopt compatibility layer in order to move protocol specific parts to their place and avoid huge universal net/compat.c file in the future. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IP_SOCKGLUE]: Remove most of the tcp specific callsArnaldo Carvalho de Melo2006-01-031-1/+1
| | | | | | | | | | | | | | As DCCP needs to be called in the same spots. Now we have a member in inet_sock (is_icsk), set at sock creation time from struct inet_protosw->flags (if INET_PROTOSW_ICSK is set, like for TCP and DCCP) to see if a struct sock instance is a inet_connection_sock for places like the ones in ip_sockglue.c (v4 and v6) where we previously were looking if sk_type was SOCK_STREAM, that is insufficient because we now use the same code for DCCP, that has sk_type SOCK_DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Rename struct tcp_func to struct inet_connection_sock_af_opsArnaldo Carvalho de Melo2006-01-031-4/+4
| | | | | | | | And move it to struct inet_connection_sock. DCCP will use it in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add const markers to various variables.Arjan van de Ven2005-11-301-1/+1
| | | | | | | | | | | the patch below marks various variables const in net/; the goal is to move them to the .rodata section so that they can't false-share cachelines with things that get written to, as well as potentially helping gcc a bit with optimisations. (these were found using a gcc patch to warn about such variables) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] tcp/route: Another look at hash table sizesMike Stroyan2005-11-301-4/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tcp_ehash hash table gets too big on systems with really big memory. It is worse on systems with pages larger than 4KB. It wastes memory that could be better used. It also makes the netstat command slow because reading /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table. The default value should not be larger for larger page sizes. It seems that the effect of page size is an unintended error dating back a long time. I also wonder if the default value really should be a larger fraction of memory for systems with more memory. While systems with really big ram can afford more space for hash tables, it is not clear to me that they benefit from increasing the allocation ratio for this table. The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and mm/page_alloc.c:alloc_large_system_hash. tcp_init calls alloc_large_system_hash passing parameters- bucketsize=sizeof(struct tcp_ehash_bucket) numentries=thash_entries scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT) limit=0 On i386, PAGE_SHIFT is 12 for a page size of 4K On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K The num_physpages test above makes the allocation take a larger fraction of the total memory on systems with larger memory. The threshold size for a i386 system is 512MB. For an ia64 system with 16KB pages the threshold is 2GB. For smaller memory systems- On i386, scale = (27 - 12) = 15 On ia64, scale = (27 - 14) = 13 For larger memory systems- On i386, scale = (25 - 12) = 13 On ia64, scale = (25 - 14) = 11 For the rest of this discussion, I'll just track the larger memory case. The default behavior has numentries=thash_entries=0, so the allocated size is determined by either scale or by the default limit of 1/16 of total memory. In alloc_large_system_hash- | numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages; | numentries += (1UL << (20 - PAGE_SHIFT)) - 1; | numentries >>= 20 - PAGE_SHIFT; | numentries <<= 20 - PAGE_SHIFT; At this point, numentries is pages for all of memory, rounded up to the nearest megabyte boundary. | /* limit to 1 bucket per 2^scale bytes of low memory */ | if (scale > PAGE_SHIFT) | numentries >>= (scale - PAGE_SHIFT); | else | numentries <<= (PAGE_SHIFT - scale); On i386, numentries >>= (13 - 12), so numentries is 1/8196 of bytes of total memory. On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of bytes of total memory. | log2qty = long_log2(numentries); | | do { | size = bucketsize << log2qty; bucketsize is 16, so size is 16 times numentries, rounded down to a power of two. On i386, size is 1/512 of bytes of total memory. On ia64, size is 1/128 of bytes of total memory. For smaller systems the results are On i386, size is 1/2048 of bytes of total memory. On ia64, size is 1/512 of bytes of total memory. The large page effect can be removed by just replacing the use of PAGE_SHIFT with a constant of 12 in the calls to alloc_large_system_hash. That makes them more like the other uses of that function from fs/inode.c and fs/dcache.c Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: spelling fixesStephen Hemminger2005-11-111-1/+1
| | | | | | | Minor spelling fixes for TCP code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Appropriate Byte Count supportStephen Hemminger2005-11-111-0/+1
| | | | | | | | | | | | | This is an updated version of the RFC3465 ABC patch originally for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting bytes ack'd rather than packets when updating congestion control. The orignal ABC described in the RFC applied to a Reno style algorithm. For advanced congestion control there is little change after leaving slow start. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP/DCCP]: Randomize port selectionStephen Hemminger2005-11-061-1/+0Star
| | | | | | | | | This patch randomizes the port selected on bind() for connections to help with possible security attacks. It should also be faster in most cases because there is no need for a global lock. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
* [TCP]: Fix TCP_OFF() bug check introduced by previous change.Herbert Xu2005-09-061-2/+2
| | | | | | | | | | The TCP_OFF assignment at the bottom of that if block can indeed set TCP_OFF without setting TCP_PAGE. Since there is not much to be gained from avoiding this situation, we might as well just zap the offset. The following patch should fix it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix sk_forward_alloc underflow in tcp_sendmsgHerbert Xu2005-09-021-5/+9
| | | | | | | | | | | | | | | | I've finally found a potential cause of the sk_forward_alloc underflows that people have been reporting sporadically. When tcp_sendmsg tacks on extra bits to an existing TCP_PAGE we don't check sk_forward_alloc even though a large amount of time may have elapsed since we allocated the page. In the mean time someone could've come along and liberated packets and reclaimed sk_forward_alloc memory. This patch makes tcp_sendmsg check sk_forward_alloc every time as we do in do_tcp_sendpages. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add sk_stream_wmem_scheduleHerbert Xu2005-09-021-2/+1Star
| | | | | | | | This patch introduces sk_stream_wmem_schedule as a short-hand for the sk_forward_alloc checking on egress. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: use __read_mostly on kmem_cache_t , DEFINE_SNMP_STAT pointersEric Dumazet2005-08-301-1/+1
| | | | | | | | | | | | | This patch puts mostly read only data in the right section (read_mostly), to help sharing of these data between CPUS without memory ping pongs. On one of my production machine, tcp_statistics was sitting in a heavily modified cache line, so *every* SNMP update had to force a reload. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Generalise tcp_listen_pollArnaldo Carvalho de Melo2005-08-301-10/+1Star
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Move TCP congestion avoidance members to icskArnaldo Carvalho de Melo2005-08-301-6/+6
| | | | | | | | | | | | | | | | | | | | | This changeset basically moves tcp_sk()->{ca_ops,ca_state,etc} to inet_csk(), minimal renaming/moving done in this changeset to ease review. Most of it is just changes of struct tcp_sock * to struct sock * parameters. With this we move to a state closer to two interesting goals: 1. Generalisation of net/ipv4/tcp_diag.c, becoming inet_diag.c, being used for any INET transport protocol that has struct inet_hashinfo and are derived from struct inet_connection_sock. Keeps the userspace API, that will just not display DCCP sockets, while newer versions of tools can support DCCP. 2. INET generic transport pluggable Congestion Avoidance infrastructure, using the current TCP CA infrastructure with DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TIMEWAIT]: Introduce inet_timewait_death_rowArnaldo Carvalho de Melo2005-08-301-2/+2
| | | | | | | | | | | That groups all of the tables and variables associated to the TCP timewait schedulling/recycling/killing code, that now can be isolated from the TCP specific code and used by other transport protocols, such as DCCP. Next changeset will move this code to net/ipv4/inet_timewait_sock.c Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Move generalised functions from tcp to inet_connection_sockArnaldo Carvalho de Melo2005-08-301-120/+0Star
| | | | | | | | | | This also improves reqsk_queue_prune and renames it to inet_csk_reqsk_queue_prune, as it deals with both inet_connection_sock and inet_request_sock objects, not just with request_sock ones thus belonging to inet_request_sock. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Introduce reqsk_queue_prune from code in tcp_synack_timerArnaldo Carvalho de Melo2005-08-301-10/+11
| | | | | | | | | | With this we're very close to getting all of the current TCP refactorings in my dccp-2.6 tree merged, next changeset will export some functions needed by the current DCCP code and then dccp-2.6.git will be born! Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ICSK]: Generalise tcp_listen_{start,stop}Arnaldo Carvalho de Melo2005-08-301-17/+21
| | | | | | | | | This also moved inet_iif from tcp to inet_hashtables.h, as it is needed by the inet_lookup callers, perhaps this needs a bit of polishing, but for now seems fine. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Just move the inet_connection_sock function from tcp sourcesArnaldo Carvalho de Melo2005-08-301-93/+0Star
| | | | | | | | | Completing the previous changeset, this also generalises tcp_v4_synq_add, renaming it to inet_csk_reqsk_queue_hash_add, already geing used in the DCCP tree, which I plan to merge RSN. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Introduce inet_connection_sockArnaldo Carvalho de Melo2005-08-301-43/+47
| | | | | | | | | | | | | | This creates struct inet_connection_sock, moving members out of struct tcp_sock that are shareable with other INET connection oriented protocols, such as DCCP, that in my private tree already uses most of these members. The functions that operate on these members were renamed, using a inet_csk_ prefix while not being moved yet to a new file, so as to ease the review of these changes. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Generalise tcp_tw_bucket, aka TIME_WAIT socketsArnaldo Carvalho de Melo2005-08-301-10/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This paves the way to generalise the rest of the sock ID lookup routines and saves some bytes in TCPv4 TIME_WAIT sockets on distro kernels (where IPv6 is always built as a module): [root@qemu ~]# grep tw_sock /proc/slabinfo tw_sock_TCPv6 0 0 128 31 1 tw_sock_TCP 0 0 96 41 1 [root@qemu ~]# Now if a protocol wants to use the TIME_WAIT generic infrastructure it only has to set the sk_prot->twsk_obj_size field with the size of its inet_timewait_sock derived sock and proto_register will create sk_prot->twsk_slab, for now its only for INET sockets, but we can introduce timewait_sock later if some non INET transport protocolo wants to use this stuff. Next changesets will take advantage of this new infrastructure to generalise even more TCP code. [acme@toy net-2.6.14]$ grep built-in /tmp/before.size /tmp/after.size /tmp/before.size: 188646 11764 5068 205478 322a6 net/ipv4/built-in.o /tmp/after.size: 188144 11764 5068 204976 320b0 net/ipv4/built-in.o [acme@toy net-2.6.14]$ Tested with both IPv4 & IPv6 (::1 (localhost) & ::ffff:172.20.0.1 (qemu host)). Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Move tcp_port_rover to inet_hashinfoArnaldo Carvalho de Melo2005-08-301-21/+21
| | | | | | | | | | Also expose all of the tcp_hashinfo members, i.e. killing those tcp_ehash, etc macros, this will more clearly expose already generic functions and some that need just a bit of work to become generic, as we'll see in the upcoming changesets. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Generalise tcp_bind_hash & tcp_inherit_portArnaldo Carvalho de Melo2005-08-301-4/+0Star
| | | | | | | This required moving tcp_bucket_cachep to inet_hashinfo. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Move bind_hash from tcp_sk to inet_skArnaldo Carvalho de Melo2005-08-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | This should really be in a inet_connection_sock, but I'm leaving it for a later optimization, when some more fields common to INET transport protocols now in tcp_sk or inet_sk will be chunked out into inet_connection_sock, for now its better to concentrate on getting the changes in the core merged to leave the DCCP tree with only DCCP specific code. Next changesets will take advantage of this move to generalise things like tcp_bind_hash, tcp_put_port, tcp_inherit_port, making the later receive a inet_hashinfo parameter, and even __tcp_tw_hashdance, etc in the future, when tcp_tw_bucket gets transformed into the struct timewait_sock hierarchy. tcp_destroy_sock also is eligible as soon as tcp_orphan_count gets moved to sk_prot. A cascade of incremental changes will ultimately make the tcp_lookup functions be fully generic. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Just rename the TCP hashtable functions/structs to inet_Arnaldo Carvalho de Melo2005-08-301-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | This is to break down the complexity of the series of patches, making it very clear that this one just does: 1. renames tcp_ prefixed hashtable functions and data structures that were already mostly generic to inet_ to share it with DCCP and other INET transport protocols. 2. Removes not used functions (__tb_head & tb_head) 3. Removes some leftover prototypes in the headers (tcp_bucket_unlock & tcp_v4_build_header) Next changesets will move tcp_sk(sk)->bind_hash to inet_sock so that we can make functions such as tcp_inherit_port, __tcp_inherit_port, tcp_v4_get_port, __tcp_put_port, generic and get others like tcp_destroy_sock closer to generic (tcp_orphan_count will go to sk->sk_prot to allow this). Eventually most of these functions will be used passing the transport protocol inet_hashinfo structure. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Cleanup INET_REFCNT_DEBUG codeArnaldo Carvalho de Melo2005-08-301-6/+1Star
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [REQSK]: Move the syn_table destroy from tcp_listen_stop to reqsk_queue_destroyArnaldo Carvalho de Melo2005-08-301-25/+10Star
| | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Kill skb->listDavid S. Miller2005-08-301-1/+1
| | | | | | | | | | | | | | Remove the "list" member of struct sk_buff, as it is entirely redundant. All SKB list removal callers know which list the SKB is on, so storing this in sk_buff does nothing other than taking up some space. Two tricky bits were SCTP, which I took care of, and two ATM drivers which Francois Romieu <romieu@fr.zoreil.com> fixed up. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Francois Romieu <romieu@fr.zoreil.com>
* [TCP]: Unconditionally clear TCP_NAGLE_PUSH in skb_entail().David S. Miller2005-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | Intention of this bit is to force pushing of the existing send queue when TCP_CORK or TCP_NODELAY state changes via setsockopt(). But it's easy to create a situation where the bit never clears. For example, if the send queue starts empty: 1) set TCP_NODELAY 2) clear TCP_NODELAY 3) set TCP_CORK 4) do small write() The current code will leave TCP_NAGLE_PUSH set after that sequence. Unconditionally clearing the bit when new data is added via skb_entail() solves the problem. Signed-off-by: David S. Miller <davem@davemloft.net>