summaryrefslogtreecommitdiffstats
path: root/net/ipv6/ip6_output.c
Commit message (Collapse)AuthorAgeFilesLines
* ipv6: ip6_sk_dst_check() must not assume ipv6 dstEric Dumazet2013-06-271-1/+7
| | | | | | | | | | | | | | | | | | | | It's possible to use AF_INET6 sockets and to connect to an IPv4 destination. After this, socket dst cache is a pointer to a rtable, not rt6_info. ip6_sk_dst_check() should check the socket dst cache is IPv6, or else various corruptions/crashes can happen. Dave Jones can reproduce immediate crash with trinity -q -l off -n -c sendmsg -c connect With help from Hannes Frederic Sowa Reported-by: Dave Jones <davej@redhat.com> Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Process unicast packet with Router Alert by checking flag in skb.YOSHIFUJI Hideaki / 吉藤英明2013-06-251-3/+2Star
| | | | | | | | | | | | | | | | Router Alert option is marked in skb. Previously, IP6CB(skb)->ra was set to positive value for such packets. Since commit dd3332bf ("ipv6: Store Router Alert option in IP6CB directly."), IP6SKB_ROUTERALERT is set in IP6CB(skb)->flags, and the value of Router Alert option (in network byte order) is set to IP6CB(skb)->ra for such packets. Multicast forwarding path uses that flag and value, but unicast forwarding path does not use the flag and misuses IP6CB(skb)->ra value. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix possible crashes in ip6_cork_release()Eric Dumazet2013-05-181-1/+1
| | | | | | | | | | | | | | commit 0178b695fd6b4 ("ipv6: Copy cork options in ip6_append_data") added some code duplication and bad error recovery, leading to potential crash in ip6_cork_release() as kfree() could be called with garbage. use kzalloc() to make sure this wont happen. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Neal Cardwell <ncardwell@google.com>
* net: sock: make sock_tx_timestamp voidDaniel Borkmann2013-04-141-5/+2Star
| | | | | | | | | | | | | Currently, sock_tx_timestamp() always returns 0. The comment that describes the sock_tx_timestamp() function wrongly says that it returns an error when an invalid argument is passed (from commit 20d4947353be, ``net: socket infrastructure for SO_TIMESTAMPING''). Make the function void, so that we can also remove all the unneeded if conditions that check for such a _non-existant_ error case in the output path. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: don't let node/interface scoped multicast traffic escape on the wireHannes Frederic Sowa2013-02-111-0/+7
| | | | | | | | | Reported-by: Erik Hugne <erik.hugne@ericsson.com> Cc: Erik Hugne <erik.hugne@ericsson.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Don't send packet to big messages to selfSteffen Klassert2013-02-061-2/+1Star
| | | | | | | | | | | Calling icmpv6_send() on a local message size error leads to an incorrect update of the path mtu in the case when IPsec is used. So use ipv6_local_error() instead to notify the socket about the error. Reported-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-01-291-2/+2
|\ | | | | | | | | | | | | Bring in the 'net' tree so that we can get some ipv4/ipv6 bug fixes that some net-next work will build upon. Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv6: fix header length calculation in ip6_append_data()Romain KUNTZ2013-01-171-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 299b0767 (ipv6: Fix IPsec slowpath fragmentation problem) has introduced a error in the header length calculation that provokes corrupted packets when non-fragmentable extensions headers (Destination Option or Routing Header Type 2) are used. rt->rt6i_nfheader_len is the length of the non-fragmentable extension header, and it should be substracted to rt->dst.header_len, and not to exthdrlen, as it was done before commit 299b0767. This patch reverts to the original and correct behavior. It has been successfully tested with and without IPsec on packets that include non-fragmentable extensions headers. Signed-off-by: Romain Kuntz <r.kuntz@ipflavors.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: remove duplicated declaration of ip6_fragment()Cong Wang2013-01-231-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | It is declared in: include/net/ip6_route.h:187:int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *)); and net/ip6_route.h is already included. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Unshare ip6_nd_hdr() and change return type to void.YOSHIFUJI Hideaki / 吉藤英明2013-01-211-33/+0Star
| | | | | | | | | | | | | | | | | | - move ip6_nd_hdr() to its users' source files. In net/ipv6/mcast.c, it will be called ip6_mc_hdr(). - make return type to void since this function never fails. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Do not depend on rt->n in ip6_finish_output2().YOSHIFUJI Hideaki / 吉藤英明2013-01-181-5/+13
| | | | | | | | | | | | | | If neigh is not found, create new one. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Do not depend on rt->n in ip6_dst_lookup_tail().YOSHIFUJI Hideaki / 吉藤英明2013-01-181-2/+6
| | | | | | | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Introduce ip6_flow_hdr() to fill version, tclass and flowlabel.YOSHIFUJI Hideaki / 吉藤英明2013-01-141-5/+3Star
|/ | | | | | | | | | This is not only for readability but also for optimization. What we do here is to build the 32bit word at the beginning of the ipv6 header (the "ip6_flow" virtual member of struct ip6_hdr in RFC3542) and we do not need to read the tclass portion of the target buffer. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Update ipv6 static library with newly needed functionsVlad Yasevich2012-11-151-65/+0Star
| | | | | | | | UDP offload needs some additional functions to be in the static kernel for it work correclty. Move those functions into the core. Signed-off-by: Vlad Yasevich <vyasevic@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: introduce ip6_rt_put()Amerigo Wang2012-11-031-2/+2
| | | | | | | | | | | As suggested by Eric, we could introduce a helper function for ipv6 too, to avoid checking if rt is NULL before dst_release(). Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: use IS_ENABLED()Amerigo Wang2012-11-011-3/+2Star
| | | | | | | | | | | | #if defined(CONFIG_FOO) || defined(CONFIG_FOO_MODULE) can be replaced by #if IS_ENABLED(CONFIG_FOO) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use a per task frag allocatorEric Dumazet2012-09-241-40/+25Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently use a per socket order-0 page cache for tcp_sendmsg() operations. This page is used to build fragments for skbs. Its done to increase probability of coalescing small write() into single segments in skbs still in write queue (not yet sent) But it wastes a lot of memory for applications handling many mostly idle sockets, since each socket holds one page in sk->sk_sndmsg_page Its also quite inefficient to build TSO 64KB packets, because we need about 16 pages per skb on arches where PAGE_SIZE = 4096, so we hit page allocator more than wanted. This patch adds a per task frag allocator and uses bigger pages, if available. An automatic fallback is done in case of memory pressure. (up to 32768 bytes per frag, thats order-3 pages on x86) This increases TCP stream performance by 20% on loopback device, but also benefits on other network devices, since 8x less frags are mapped on transmit and unmapped on tx completion. Alexander Duyck mentioned a probable performance win on systems with IOMMU enabled. Its possible some SG enabled hardware cant cope with bigger fragments, but their ndo_start_xmit() should already handle this, splitting a fragment in sub fragments, since some arches have PAGE_SIZE=65536 Successfully tested on various ethernet devices. (ixgbe, igb, bnx2x, tg3, mellanox mlx4) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Vijay Subramanian <subramanian.vijay@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: remove some useless RCU read lockAmerigo Wang2012-09-101-11/+2Star
| | | | | | | | | | | | | | | After this commit: commit 97cac0821af4474ec4ba3a9e7a36b98ed9b6db88 Author: David S. Miller <davem@davemloft.net> Date: Mon Jul 2 22:43:47 2012 -0700 ipv6: Store route neighbour in rt6_info struct. we no longer use RCU to protect route neighbour. Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: nf_conntrack_ipv6: improve fragmentation handlingPatrick McHardy2012-08-301-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IPv6 conntrack fragmentation currently has a couple of shortcomings. Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the defragmented packet is then passed to conntrack, the resulting conntrack information is attached to each original fragment and the fragments then continue their way through the stack. Helper invocation occurs in the POSTROUTING hook, at which point only the original fragments are available. The result of this is that fragmented packets are never passed to helpers. This patch improves the situation in the following way: - If a reassembled packet belongs to a connection that has a helper assigned, the reassembled packet is passed through the stack instead of the original fragments. - During defragmentation, the largest received fragment size is stored. On output, the packet is refragmented if required. If the largest received fragment size exceeds the outgoing MTU, a "packet too big" message is generated, thus behaving as if the original fragments were passed through the stack from an outside point of view. - The ipv6_helper() hook function can't receive fragments anymore for connections using a helper, so it is switched to use ipv6_skip_exthdr() instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the reassembled packets are passed to connection tracking helpers. The result of this is that we can properly track fragmented packets, but still generate ICMPv6 Packet too big messages if we would have before. This patch is also required as a precondition for IPv6 NAT, where NAT helpers might enlarge packets up to a point that they require fragmentation. In that case we can't generate Packet too big messages since the proper MTU can't be calculated in all cases (f.i. when changing textual representation of a variable amount of addresses), so the packet is transparently fragmented iff the original packet or fragments would have fit the outgoing MTU. IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>. Signed-off-by: Patrick McHardy <kaber@trash.net>
* inet: Minimize use of cached route inetpeer.David S. Miller2012-07-111-2/+8
| | | | | | | | | | | | Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix a bad cast in ip6_dst_lookup_tail()Eric Dumazet2012-07-061-1/+1
| | | | | | | | | Fix a bug in ip6_dst_lookup_tail(), where typeof(dst) is "struct dst_entry **", not "struct dst_entry *" Reported-by: Fengguang Wu <wfg@linux.intel.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Store route neighbour in rt6_info struct.David S. Miller2012-07-051-2/+6
| | | | | | | | This makes for a simplified conversion away from dst_get_neighbour*(). All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*(). Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Do delayed neigh confirmation.David S. Miller2012-07-051-1/+1
| | | | | | | | | | | | | | When a dst_confirm() happens, mark the confirmation as pending in the dst. Then on the next packet out, when we have the neigh in-hand, do the update. This removes the dependency in dst_confirm() of dst's having an attached neigh. While we're here, remove the explicit 'dst' NULL check, all except 2 or 3 call sites ensure it's not NULL. So just fix those cases up. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-06-131-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: MAINTAINERS drivers/net/wireless/iwlwifi/pcie/trans.c The iwlwifi conflict was resolved by keeping the code added in 'net' that turns off the buggy chip feature. The MAINTAINERS conflict was merely overlapping changes, one change updated all the wireless web site URLs and the other changed some GIT trees to be Johannes's instead of John's. Signed-off-by: David S. Miller <davem@davemloft.net>
| * snmp: fix OutOctets counter to include forwarded datagramsVincent Bernat2012-06-071-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 4293 defines ipIfStatsOutOctets (similar definition for ipSystemStatsOutOctets): The total number of octets in IP datagrams delivered to the lower layers for transmission. Octets from datagrams counted in ipIfStatsOutTransmits MUST be counted here. And ipIfStatsOutTransmits: The total number of IP datagrams that this entity supplied to the lower layers for transmission. This includes datagrams generated locally and those forwarded by this entity. Therefore, IPSTATS_MIB_OUTOCTETS must be incremented when incrementing IPSTATS_MIB_OUTFORWDATAGRAMS. IP_UPD_PO_STATS is not used since ipIfStatsOutRequests must not include forwarded datagrams: The total number of IP datagrams that local IP user-protocols (including ICMP) supplied to IP in requests for transmission. Note that this counter does not include any datagrams counted in ipIfStatsOutForwDatagrams. Signed-off-by: Vincent Bernat <bernat@luffy.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net-next: add dev_loopback_xmit() to avoid duplicate codeMichel Machado2012-06-131-14/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add dev_loopback_xmit() in order to deduplicate functions ip_dev_loopback_xmit() (in net/ipv4/ip_output.c) and ip6_dev_loopback_xmit() (in net/ipv6/ip6_output.c). I was about to reinvent the wheel when I noticed that ip_dev_loopback_xmit() and ip6_dev_loopback_xmit() do exactly what I need and are not IP-only functions, but they were not available to reuse elsewhere. ip6_dev_loopback_xmit() does not have line "skb_dst_force(skb);", but I understand that this is harmless, and should be in dev_loopback_xmit(). Signed-off-by: Michel Machado <michel@digirati.com.br> CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jpirko@redhat.com> CC: "Michał Mirosław" <mirq-linux@rere.qmqm.pl> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Create and use rt{,6}_get_peer_create().David S. Miller2012-06-091-7/+4Star
|/ | | | | | | | | | | There's a lot of places that open-code rt{,6}_get_peer() only because they want to set 'create' to one. So add an rt{,6}_get_peer_create() for their sake. There were also a few spots open-coding plain rt{,6}_get_peer() and those are transformed here as well. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix incorrect ipsec fragmentGao feng2012-05-271-18/+50
| | | | | | | | | | | | | | | | | | | | | | | Since commit ad0081e43a "ipv6: Fragment locally generated tunnel-mode IPSec6 packets as needed" the fragment of packets is incorrect. because tunnel mode needs IPsec headers and trailer for all fragments, while on transport mode it is sufficient to add the headers to the first fragment and the trailer to the last. so modify mtu and maxfraglen base on ipsec mode and if fragment is first or last. with my test,it work well(every fragment's size is the mtu) and does not trigger slow fragment path. Changes from v1: though optimization, mtu_prev and maxfraglen_prev can be delete. replace xfrm mode codes with dst_entry's new frag DST_XFRM_TUNNEL. add fuction ip6_append_data_mtu to make codes clearer. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: disable GSO on sockets hitting dst_allfragEric Dumazet2012-05-191-1/+4
| | | | | | | | | | | | | | | | | | | | | | If the allfrag feature has been set on a host route (due to an ICMPv6 Packet Too Big received indicating a MTU of less than 1280), we hit a very slow behavior in TCP stack, because all big packets are dropped and only a retransmit timer is able to push one MSS frame every 200 ms. One way to handle this is to disable GSO on the socket the first time a super packet is dropped. Adding a specific dst_allfrag() in the fast path is probably overkill since the dst_allfrag() case almost never happen. Result on netperf TCP_STREAM, one flow : Before : 60 kbit/sec After : 1.6 Gbit/sec Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: ip6_fragment() should check CHECKSUM_PARTIALEric Dumazet2012-05-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting Tore Anderson from : If the allfrag feature has been set on a host route (due to an ICMPv6 Packet Too Big received indicating a MTU of less than 1280), TCP SYN/ACK packets to that destination appears to get an incorrect TCP checksum. This in turn means they are thrown away as invalid. In the case of an IPv4 client behind a link with a MTU of less than 1260, accessing an IPv6 server through a stateless translator, this means that the client can only download a single large file from the server, because once it is in the server's routing cache with the allfrag feature set, new TCP connections can no longer be established. </endquote> It appears ip6_fragment() doesn't handle CHECKSUM_PARTIAL properly. As network drivers are not prepared to fetch correct transport header, a safe fix is to call skb_checksum_help() before fragmenting packet. Reported-by: Tore Anderson <tore@fud.no> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Tore Anderson <tore@fud.no> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: remove csummode in ip6_append_data()Eric Dumazet2012-05-181-3/+1Star
| | | | | | | csummode variable is always CHECKSUM_NONE in ip6_append_data() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Convert net_ratelimit uses to net_<level>_ratelimitedJoe Perches2012-05-151-2/+1Star
| | | | | | | | | | Standardize the net core ratelimited logging functions. Coalesce formats, align arguments. Change a printk then vprintk sequence to use printf extension %pV. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Export ipv6 functions for use by other protocolsChris Elston2012-05-011-0/+3
| | | | | | | | | | For implementing other protocols on top of IPv6, such as L2TPv3's IP encapsulation over ipv6, we'd like to call some IPv6 functions which are not currently exported. This patch exports them. Signed-off-by: Chris Elston <celston@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: call consume_skb() in frag/reassemblyEric Dumazet2012-04-261-2/+2
| | | | | | | | Some kfree_skb() calls should be replaced by consume_skb() to avoid drop_monitor/dropwatch false positives. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix incorrent ipv6 ipsec packet fragmentGao feng2012-03-201-5/+6
| | | | | | | | | | | | | Since commit 299b0767(ipv6: Fix IPsec slowpath fragmentation problem) In func ip6_append_data,after call skb_put(skb, fraglen + dst_exthdrlen) the skb->len contains dst_exthdrlen,and we don't reduce dst_exthdrlen at last This will make fraggap>0 in next "while cycle",and cause the size of skb incorrent Fix this by reserve headroom for dst_exthdrlen. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Eliminate dst_get_neighbour_noref() usage in ip6_forward().David S. Miller2012-01-281-5/+3Star
| | | | | | | It's only used to get at neigh->primary_key, which in this context is always going to be the same as rt->rt6i_gateway. Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Remove neigh argument from ndisc_send_redirect()David S. Miller2012-01-281-1/+1
| | | | | | | Instead, compute it as-needed inside of that function using dst_neigh_lookup(). Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-231-1/+1
|\ | | | | | | | | | | | | | | | | | | Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: introduce DST_NOPEER dst flagEric Dumazet2011-12-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chris Boot reported crashes occurring in ipv6_select_ident(). [ 461.457562] RIP: 0010:[<ffffffff812dde61>] [<ffffffff812dde61>] ipv6_select_ident+0x31/0xa7 [ 461.578229] Call Trace: [ 461.580742] <IRQ> [ 461.582870] [<ffffffff812efa7f>] ? udp6_ufo_fragment+0x124/0x1a2 [ 461.589054] [<ffffffff812dbfe0>] ? ipv6_gso_segment+0xc0/0x155 [ 461.595140] [<ffffffff812700c6>] ? skb_gso_segment+0x208/0x28b [ 461.601198] [<ffffffffa03f236b>] ? ipv6_confirm+0x146/0x15e [nf_conntrack_ipv6] [ 461.608786] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.614227] [<ffffffff81271d64>] ? dev_hard_start_xmit+0x357/0x543 [ 461.620659] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.626440] [<ffffffffa0379745>] ? br_parse_ip_options+0x19a/0x19a [bridge] [ 461.633581] [<ffffffff812722ff>] ? dev_queue_xmit+0x3af/0x459 [ 461.639577] [<ffffffffa03747d2>] ? br_dev_queue_push_xmit+0x72/0x76 [bridge] [ 461.646887] [<ffffffffa03791e3>] ? br_nf_post_routing+0x17d/0x18f [bridge] [ 461.653997] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.659473] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.665485] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.671234] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.677299] [<ffffffffa0379215>] ? nf_bridge_update_protocol+0x20/0x20 [bridge] [ 461.684891] [<ffffffffa03bb0e5>] ? nf_ct_zone+0xa/0x17 [nf_conntrack] [ 461.691520] [<ffffffffa0374760>] ? br_flood+0xfa/0xfa [bridge] [ 461.697572] [<ffffffffa0374812>] ? NF_HOOK.constprop.8+0x3c/0x56 [bridge] [ 461.704616] [<ffffffffa0379031>] ? nf_bridge_push_encap_header+0x1c/0x26 [bridge] [ 461.712329] [<ffffffffa037929f>] ? br_nf_forward_finish+0x8a/0x95 [bridge] [ 461.719490] [<ffffffffa037900a>] ? nf_bridge_pull_encap_header+0x1c/0x27 [bridge] [ 461.727223] [<ffffffffa0379974>] ? br_nf_forward_ip+0x1c0/0x1d4 [bridge] [ 461.734292] [<ffffffff81291c4d>] ? nf_iterate+0x41/0x77 [ 461.739758] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.746203] [<ffffffff81291cf6>] ? nf_hook_slow+0x73/0x111 [ 461.751950] [<ffffffffa03748cc>] ? __br_deliver+0xa0/0xa0 [bridge] [ 461.758378] [<ffffffffa037533a>] ? NF_HOOK.constprop.4+0x56/0x56 [bridge] This is caused by bridge netfilter special dst_entry (fake_rtable), a special shared entry, where attaching an inetpeer makes no sense. Problem is present since commit 87c48fa3b46 (ipv6: make fragment identifications less predictable) Introduce DST_NOPEER dst flag and make sure ipv6_select_ident() and __ip_select_ident() fallback to the 'no peer attached' handling. Reported-by: Chris Boot <bootc@bootc.net> Tested-by: Chris Boot <bootc@bootc.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-051-3/+3
| | | | | | | | | | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* | ipv6: Add fragment reporting to ipv6_skip_exthdr().Jesse Gross2011-12-031-1/+2
| | | | | | | | | | | | | | | | | | | | While parsing through IPv6 extension headers, fragment headers are skipped making them invisible to the caller. This reports the fragment offset of the last header in order to make it possible to determine whether the packet is fragmented and, if so whether it is a first or last fragment. Signed-off-by: Jesse Gross <jesse@nicira.com>
* | net: remove ipv6_addr_copy()Alexey Dobriyan2011-11-221-9/+9
| | | | | | | | | | | | | | C assignment can handle struct in6_addr copying. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv6: Remove all uses of LL_ALLOCATED_SPACEHerbert Xu2011-11-181-2/+6
|/ | | | | | | | | | | | | | | | | | | ipv6: Remove all uses of LL_ALLOCATED_SPACE The macro LL_ALLOCATED_SPACE was ill-conceived. It applies the alignment to the sum of needed_headroom and needed_tailroom. As the amount that is then reserved for head room is needed_headroom with alignment, this means that the tail room left may be too small. This patch replaces all uses of LL_ALLOCATED_SPACE in net/ipv6 with the macro LL_RESERVED_SPACE and direct reference to needed_tailroom. This also fixes the problem with needed_headroom changing between allocating the skb and reserving the head room. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: fix error propagation in ip6_ufo_append_data()Zheng Yan2011-10-281-1/+1
| | | | | | | We should return errcode from sock_alloc_send_skb() Signed-off-by: Zheng Yan <zheng.z.yan@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: tcp: fix TCLASS value in ACK messages sent from TIME_WAITEric Dumazet2011-10-271-5/+2Star
| | | | | | | | | | | | | | | commit 66b13d99d96a (ipv4: tcp: fix TOS value in ACK messages sent from TIME_WAIT) fixed IPv4 only. This part is for the IPv6 side, adding a tclass param to ip6_xmit() We alias tw_tclass and tw_tos, if socket family is INET6. [ if sockets is ipv4-mapped, only IP_TOS socket option is used to fill TOS field, TCLASS is not taken into account ] Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: add skb frag size accessorsEric Dumazet2011-10-191-2/+3
| | | | | | | | | | | To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: Fix IPsec slowpath fragmentation problemSteffen Klassert2011-10-191-6/+12
| | | | | | | | | | | | | ip6_append_data() builds packets based on the mtu from dst_mtu(rt->dst.path). On IPsec the effective mtu is lower because we need to add the protocol headers and trailers later when we do the IPsec transformations. So after the IPsec transformations the packet might be too big, which leads to a slowpath fragmentation then. This patch fixes this by building the packets based on the lower IPsec mtu from dst_mtu(&rt->dst) and adapts the exthdr handling to this. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: ipv6: convert to SKB frag APIsIan Campbell2011-08-251-3/+4
| | | | | | | | | | | | Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: netdev@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix NULL dereferences in check_peer_redir()Eric Dumazet2011-08-031-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | Gergely Kalman reported crashes in check_peer_redir(). It appears commit f39925dbde778 (ipv4: Cache learned redirect information in inetpeer.) added a race, leading to possible NULL ptr dereference. Since we can now change dst neighbour, we should make sure a reader can safely use a neighbour. Add RCU protection to dst neighbour, and make sure check_peer_redir() can be called safely by different cpus in parallel. As neighbours are already freed after one RCU grace period, this patch should not add typical RCU penalty (cache cold effects) Many thanks to Gergely for providing a pretty report pointing to the bug. Reported-by: Gergely Kalman <synapse@hippy.csoma.elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv6: make fragment identifications less predictableEric Dumazet2011-07-221-5/+31
| | | | | | | | | | | | | | | | | | IPv6 fragment identification generation is way beyond what we use for IPv4 : It uses a single generator. Its not scalable and allows DOS attacks. Now inetpeer is IPv6 aware, we can use it to provide a more secure and scalable frag ident generator (per destination, instead of system wide) This patch : 1) defines a new secure_ipv6_id() helper 2) extends inet_getid() to provide 32bit results 3) extends ipv6_select_ident() with a new dest parameter Reported-by: Fernando Gont <fernando@gont.com.ar> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>