summaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | iptunnels: remove net arg from iptunnel_xmit()Nicolas Dichtel2013-09-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This argument is not used, let's remove it. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: neighbour: Remove CONFIG_ARPDTim Gardner2013-09-041-2/+0Star
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This config option is superfluous in that it only guards a call to neigh_app_ns(). Enabling CONFIG_ARPD by default has no change in behavior. There will now be call to __neigh_notify() for each ARP resolution, which has no impact unless there is a user space daemon waiting to receive the notification, i.e., the case for which CONFIG_ARPD was designed anyways. Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: unify skb_udp_tunnel_segment() and skb_udp6_tunnel_segment()Cong Wang2013-09-011-55/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by Pravin, we can unify the code in case of duplicated code. Cc: Pravin Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Add generic UDP Tunnel segmentationCong Wang2013-09-012-52/+111
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit 731362674580cb0c696cd1b1a03d8461a10cf90a (tunneling: Add generic Tunnel segmentation) This patch adds generic tunneling offloading support for IPv6-UDP based tunnels. This can be used by tunneling protocols like VXLAN. Cc: Jesse Gross <jesse@nicira.com> Cc: Pravin B Shelar <pshelar@nicira.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vxlan: add ipv6 proxy supportCong Wang2013-09-012-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the IPv6 version of "arp_reduce", ndisc_send_na() will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: move in6_dev_finish_destroy() into core kernelCong Wang2013-09-012-30/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | in6_dev_put() will be needed by vxlan module, so is in6_dev_finish_destroy(). Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | vxlan: add ipv6 route short circuit supportCong Wang2013-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | route short circuit only has IPv4 part, this patch adds the IPv6 part. nd_tbl will be needed. Cc: David S. Miller <davem@davemloft.net> Cc: David Stevens <dlstevens@us.ibm.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: do not call ndisc_send_rs() with write lockCong Wang2013-09-011-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because vxlan module will call ip6_dst_lookup() in TX path, which will hold write lock. So we have to release this write lock before calling ndisc_send_rs(), otherwise could deadlock. Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: export in6addr_loopback to modulesCong Wang2013-09-012-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | It is needed by vxlan module. Noticed by Mike. Cc: Mike Rapoport <mike.rapoport@ravellosystems.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: export a stub for IPv6 symbols used by vxlanCong Wang2013-09-012-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case IPv6 is compiled as a module, introduce a stub for ipv6_sock_mc_join and ipv6_sock_mc_drop etc.. It will be used by vxlan module. Suggested by Ben. This is an ugly but easy solution for now. Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: move ip6_local_out into core kernelCong Wang2013-09-012-25/+26
| | | | | | | | | | | | | | | | | | | | | It will be used the vxlan kernel module. Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: move ip6_dst_hoplimit() into core kernelCong Wang2013-09-012-19/+22
| | | | | | | | | | | | | | | | | | | | | | | | It will be used by vxlan, and may not be inlined. Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Remove redundant sk variableThomas Graf2013-08-301-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | A sk variable initialized to ndisc_sk is already available outside of the branch. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: drop fragmented ndisc packets by default (RFC 6980)Hannes Frederic Sowa2013-08-292-0/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements RFC6980: Drop fragmented ndisc packets by default. If a fragmented ndisc packet is received the user is informed that it is possible to disable the check. Cc: Fernando Gont <fernando@gont.com.ar> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: add IPv6 SYNPROXY targetPatrick McHardy2013-08-283-0/+509
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an IPv6 version of the SYNPROXY target. The main differences to the IPv4 version is routing and IP header construction. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | net: syncookies: export cookie_v6_init_sequence/cookie_v6_checkPatrick McHardy2013-08-281-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract the local TCP stack independant parts of tcp_v6_init_sequence() and cookie_v6_check() and export them for use by the upcoming IPv6 SYNPROXY target. Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: David S. Miller <davem@davemloft.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_conntrack: make sequence number adjustments usuable without NATPatrick McHardy2013-08-281-5/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out sequence number adjustments from NAT and move them to the conntrack core to make them usable for SYN proxying. The sequence number adjustment information is moved to a seperate extend. The extend is added to new conntracks when a NAT mapping is set up for a connection using a helper. As a side effect, this saves 24 bytes per connection with NAT in the common case that a connection does not have a helper assigned. Signed-off-by: Patrick McHardy <kaber@trash.net> Tested-by: Martin Topholm <mph@one.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_defrag_ipv6.o included twiceNathan Hintz2013-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | 'nf_defrag_ipv6' is built as a separate module; it shouldn't be included in the 'nf_conntrack_ipv6' module as well. Signed-off-by: Nathan Hintz <nlhintz@hotmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: ip[6]t_REJECT: tcp-reset using wrong MAC source if bridgedPhil Oester2013-08-281-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported by Casper Gripenberg, in a bridged setup, using ip[6]t_REJECT with the tcp-reset option sends out reset packets with the src MAC address of the local bridge interface, instead of the MAC address of the intended destination. This causes some routers/firewalls to drop the reset packet as it appears to be spoofed. Fix this by bypassing ip[6]_local_out and setting the MAC of the sender in the tcp reset packet. This closes netfilter bugzilla #531. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-264-7/+33
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/wireless/iwlwifi/pcie/trans.c include/linux/inetdevice.h The inetdevice.h conflict involves moving the IPV4_DEVCONF values into a UAPI header, overlapping additions of some new entries. The iwlwifi conflict is a context overlap. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: ipv6: mcast: minor: use defines for rfc3810/8.1 lengthsDaniel Borkmann2013-08-211-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of hard-coding length values, use a define to make it clear where those lengths come from. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: ipv6: minor: *_start_timer: rather use unsigned longDaniel Borkmann2013-08-211-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the functions mld_gq_start_timer(), mld_ifc_start_timer(), and mld_dad_start_timer(), rather use unsigned long than int as we operate only on unsigned values anyway. This seems more appropriate as there is no good reason to do type conversions to int, that could lead to future errors. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: ipv6: igmp6_event_query: use msecs_to_jiffiesDaniel Borkmann2013-08-211-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use proper API functions to calculate jiffies from milliseconds and not the crude method of dividing HZ by a value. This ensures more accurate values even in the case of strange HZ values. While at it, also simplify code in the mlh2 case by using max(). Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip6_tunnel: ensure to always have a link local addressNicolas Dichtel2013-08-212-42/+18Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an Xin6 tunnel is set up, we check other netdevices to inherit the link- local address. If none is available, the interface will not have any link-local address. RFC4862 expects that each interface has a link local address. Now than this kind of tunnels supports x-netns, it's easy to fall in this case (by creating the tunnel in a netns where ethernet interfaces stand and then moving it to a other netns where no ethernet interface is available). RFC4291, Appendix A suggests two methods: the first is the one currently implemented, the second is to generate a unique identifier, so that we can always generate the link-local address. Let's use eth_random_addr() to generate this interface indentifier. I remove completly the previous method, hence for the whole life of the interface, the link-local address remains the same (previously, it depends on which ethernet interfaces were up when the tunnel interface was set up). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Revert "ipv6: fix checkpatch errors in net/ipv6/addrconf.c"David S. Miller2013-08-211-14/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit df8372ca747f6da9e8590775721d9363c1dfc87e. These changes are buggy and make unintended semantic changes to ip6_tnl_add_linklocal(). Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: fix checkpatch errors in net/ipv6/addrconf.cdingtianhong2013-08-211-10/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ERROR: code indent should use tabs where possible: fix 2. ERROR: do not use assignment in if condition: fix 5. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: convert the uses of ADBG and remove the superfluous parenthesesdingtianhong2013-08-211-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just follow the Joe Perches's opinion, it is a better way to fix the style errors. Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'master' of ↵David S. Miller2013-08-201-1/+1
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next Conflicts: net/netfilter/nf_conntrack_proto_tcp.c The conflict had to do with overlapping changes dealing with fixing the use of an "s32" to hold the value returned by NAT_OFFSET(). Pablo Neira Ayuso says: ==================== The following batch contains Netfilter/IPVS updates for your net-next tree. More specifically, they are: * Trivial typo fix in xt_addrtype, from Phil Oester. * Remove net_ratelimit in the conntrack logging for consistency with other logging subsystem, from Patrick McHardy. * Remove unneeded includes from the recently added xt_connlabel support, from Florian Westphal. * Allow to update conntracks via nfqueue, don't need NFQA_CFG_F_CONNTRACK for this, from Florian Westphal. * Remove tproxy core, now that we have socket early demux, from Florian Westphal. * A couple of patches to refactor conntrack event reporting to save a good bunch of lines, from Florian Westphal. * Fix missing locking in NAT sequence adjustment, it did not manifested in any known bug so far, from Patrick McHardy. * Change sequence number adjustment variable to 32 bits, to delay the possible early overflow in long standing connections, also from Patrick. * Comestic cleanups for IPVS, from Dragos Foianu. * Fix possible null dereference in IPVS in the SH scheduler, from Daniel Borkmann. * Allow to attach conntrack expectations via nfqueue. Before this patch, you had to use ctnetlink instead, thus, we save the conntrack lookup. * Export xt_rpfilter and xt_HMARK header files, from Nicolas Dichtel. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | netfilter: nf_conntrack: don't send destroy events from iteratorFlorian Westphal2013-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let nf_ct_delete handle delivery of the DESTROY event. Based on earlier patch from Pablo Neira. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-172-5/+13
| |\ \ \ \
| * | | | | net: proc_fs: trivial: print UIDs as unsigned intFrancesco Fusco2013-08-152-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UIDs are printed in the proc_fs as signed int, whereas they are unsigned int. Signed-off-by: Francesco Fusco <ffusco@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ip6tnl: add x-netns supportNicolas Dichtel2013-08-152-10/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows to switch the netns when packet is encapsulated or decapsulated. In other word, the encapsulated packet is received in a netns, where the lookup is done to find the tunnel. Once the tunnel is found, the packet is decapsulated and injecting into the corresponding interface which stands to another netns. When one of the two netns is removed, the tunnel is destroyed. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv4 tunnels: use net_eq() helper to check netnsNicolas Dichtel2013-08-151-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's better to use available helpers for these tests. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv6: make unsolicited report intervals configurable for mldHannes Frederic Sowa2013-08-142-3/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit cab70040dfd95ee32144f02fade64f0cb94f31a0 ("net: igmp: Reduce Unsolicited report interval to 1s when using IGMPv3") and 2690048c01f32bf45d1c1e1ab3079bc10ad2aea7 ("net: igmp: Allow user-space configuration of igmp unsolicited report interval") by William Manley made igmp unsolicited report intervals configurable per interface and corrected the interval of unsolicited igmpv3 report messages resendings to 1s. Same needs to be done for IPv6: MLDv1 (RFC2710 7.10.): 10 seconds MLDv2 (RFC3810 9.11.): 1 second Both intervals are configurable via new procfs knobs mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval. (also added .force_mld_version to ipv6_devconf_dflt to bring structs in line without semantic changes) v2: a) Joined documentation update for IPv4 and IPv6 MLD/IGMP unsolicited_report_interval procfs knobs. b) incorporate stylistic feedback from William Manley v3: a) add new DEVCONF_* values to the end of the enum (thanks to David Miller) Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: William Manley <william.manley@youview.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: add SNMP counters tracking incoming ECN bitsEric Dumazet2013-08-092-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With GRO/LRO processing, there is a problem because Ip[6]InReceives SNMP counters do not count the number of frames, but number of aggregated segments. Its probably too late to change this now. This patch adds four new counters, tracking number of frames, regardless of LRO/GRO, and on a per ECN status basis, for IPv4 and IPv6. Ip[6]NoECTPkts : Number of packets received with NOECT Ip[6]ECT1Pkts : Number of packets received with ECT(1) Ip[6]ECT0Pkts : Number of packets received with ECT(0) Ip[6]CEPkts : Number of packets received with Congestion Experienced lph37:~# nstat | egrep "Pkts|InReceive" IpInReceives 1634137 0.0 Ip6InReceives 3714107 0.0 Ip6InNoECTPkts 19205 0.0 Ip6InECT0Pkts 52651828 0.0 IpExtInNoECTPkts 33630 0.0 IpExtInECT0Pkts 15581379 0.0 IpExtInCEPkts 6 0.0 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2013-08-045-40/+45
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge net into net-next to setup some infrastructure Eric Dumazet needs for usbnet changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | fib_rules: fix suppressor names and default valuesStefan Tomanek2013-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change brings the suppressor attribute names into line; it also changes the data types to provide a more consistent interface. While -1 indicates that the suppressor is not enabled, values >= 0 for suppress_prefixlen or suppress_ifgroup reject routing decisions violating the constraint. This changes the previously presented behaviour of suppress_prefixlen, where a prefix length _less_ than the attribute value was rejected. After this change, a prefix length less than *or* equal to the value is considered a violation of the rule constraint. It also changes the default values for default and newly added rules (disabling any suppression for those). Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | fib_rules: add route suppression based on ifgroupStefan Tomanek2013-08-031-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds the ability to suppress a routing decision based upon the interface group the selected interface belongs to. This allows it to exclude specific devices from a routing decision. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | icmpv6_filter: allow ICMPv6 messages with bodies < 4 bytesWerner Almesberger2013-08-031-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By using sizeof(_hdr), net/ipv6/raw.c:icmpv6_filter implicitly assumes that any valid ICMPv6 message is at least eight bytes long, i.e., that the message body is at least four bytes. The DIS message of RPL (RFC 6550 section 6.2, from the 6LoWPAN world), has a minimum length of only six bytes, and is thus blocked by icmpv6_filter. RFC 4443 seems to allow even a zero-sized body, making the minimum allowable message size four bytes. Signed-off-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | icmpv6_filter: fix "_hdr" incorrectly being a pointerWerner Almesberger2013-08-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "_hdr" should hold the ICMPv6 header while "hdr" is the pointer to it. This worked by accident. Signed-off-by: Werner Almesberger <werner@almesberger.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv6: bump genid when delete/add addressfan.du2013-08-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Server Client 2001:1::803/64 <-> 2001:1::805/64 2001:2::804/64 <-> 2001:2::806/64 Server side fib binary tree looks like this: (2001:/64) / / ffff88002103c380 / \ (2) / \ (2001::803/128) ffff880037ac07c0 / \ / \ (3) ffff880037ac0640 (2001::806/128) / \ (1) / \ (2001::804/128) (2001::805/128) Delete 2001::804/64 won't cause prefix route deleted as well as rt in (3) destinate to 2001::806 with source address as 2001::804/64. That's because 2001::803/64 is still alive, which make onlink=1 in ipv6_del_addr, this is where the substantial difference between same prefix configuration and different prefix configuration :) So packet are still transmitted out to 2001::806 with source address as 2001::804/64. So bump genid will clear rt in (3), and up layer protocol will eventually find the right one for themselves. This problem arised from the discussion in here: http://marc.info/?l=linux-netdev&m=137404469219410&w=4 Signed-off-by: Fan Du <fan.du@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv6: fib6_rules should return exact return valueHannes Frederic Sowa2013-08-011-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the addition of the suppress operation (7764a45a8f1fe74d4f7d301eaca2e558e7e2831a ("fib_rules: add .suppress operation") we rely on accurate error reporting of the fib_rules.actions. fib6_rule_action always returned -EAGAIN in case we could not find a matching route and 0 if a rule was matched. This also included a match for blackhole or prohibited rule actions which could get suppressed by the new logic. So adapt fib6_rule_action to always return the correct error code as its counterpart fib4_rule_action does. This also fixes a possiblity of nullptr-deref where we don't find a table, thus rt == NULL. Because the condition rt != ip6_null_entry still holdes it seems we could later get a nullptr bug on dereference rt->dst. v2: a) Fixed a brain fart in the commit msg (the rule => a table, etc). No changes to the patch. Cc: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | fib_rules: add .suppress operationStefan Tomanek2013-08-011-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change adds a new operation to the fib_rules_ops struct; it allows the suppression of routing decisions if certain criteria are not met by its results. The first implemented constraint is a minimum prefix length added to the structures of routing rules. If a rule is added with a minimum prefix length >0, only routes meeting this threshold will be considered. Any other (more general) routing table entries will be ignored. When configuring a system with multiple network uplinks and default routes, it is often convinient to reference the main routing table multiple times - but omitting the default route. Using this patch and a modified "ip" utility, this can be achieved by using the following command sequence: $ ip route add table secuplink default via 10.42.23.1 $ ip rule add pref 100 table main prefixlength 1 $ ip rule add pref 150 fwmark 0xA table secuplink With this setup, packets marked 0xA will be processed by the additional routing table "secuplink", but only if no suitable route in the main routing table can be found. By using a minimal prefixlength of 1, the default route (/0) of the table "main" is hidden to packets processed by rule 100; packets traveling to destinations with more specific routing entries are processed as usual. Signed-off-by: Stefan Tomanek <stefan.tomanek@wertarbyte.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: split rt_genid for ipv4 and ipv6fan.du2013-07-312-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current net name space has only one genid for both IPv4 and IPv6, it has below drawbacks: - Add/delete an IPv4 address will invalidate all IPv6 routing table entries. - Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table entries even when the policy is only applied for one address family. Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6 separately in a fine granularity. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | tcp: add tcp_syncookies mode to allow unconditionally generation of syncookiesHannes Frederic Sowa2013-07-311-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you want to test which effects syncookies have to your | network connections you can set this knob to 2 to enable | unconditionally generation of syncookies. Original idea and first implementation by Eric Dumazet. Cc: Florian Westphal <fw@strlen.de> Cc: David Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv4, ipv6: send igmpv3/mld packets with TC_PRIO_CONTROLHannes Frederic Sowa2013-07-281-1/+3
| | |_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v2: a) Also send ipv4 igmp messages with TC_PRIO_CONTROL Cc: William Manley <william.manley@youview.com> Cc: Lukas Tribus <luky-37@hotmail.com> Acked-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | tcp: TCP_NOTSENT_LOWAT socket optionEric Dumazet2013-07-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Idea of this patch is to add optional limitation of number of unsent bytes in TCP sockets, to reduce usage of kernel memory. TCP receiver might announce a big window, and TCP sender autotuning might allow a large amount of bytes in write queue, but this has little performance impact if a large part of this buffering is wasted : Write queue needs to be large only to deal with large BDP, not necessarily to cope with scheduling delays (incoming ACKS make room for the application to queue more bytes) For most workloads, using a value of 128 KB or less is OK to give applications enough time to react to POLLOUT events in time (or being awaken in a blocking sendmsg()) This patch adds two ways to set the limit : 1) Per socket option TCP_NOTSENT_LOWAT 2) A sysctl (/proc/sys/net/ipv4/tcp_notsent_lowat) for sockets not using TCP_NOTSENT_LOWAT socket option (or setting a zero value) Default value being UINT_MAX (0xFFFFFFFF), meaning this has no effect. This changes poll()/select()/epoll() to report POLLOUT only if number of unsent bytes is below tp->nosent_lowat Note this might increase number of sendmsg()/sendfile() calls when using non blocking sockets, and increase number of context switches for blocking sockets. Note this is not related to SO_SNDLOWAT (as SO_SNDLOWAT is defined as : Specify the minimum number of bytes in the buffer until the socket layer will pass the data to the protocol) Tested: netperf sessions, and watching /proc/net/protocols "memory" column for TCP With 200 concurrent netperf -t TCP_STREAM sessions, amount of kernel memory used by TCP buffers shrinks by ~55 % (20567 pages instead of 45458) lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 45458 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 45458 no 208 yes kernel y y y y y y y y y y y y y n y y y y y lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# (super_netperf 200 -t TCP_STREAM -H remote -l 90 &); sleep 60 ; grep TCP /proc/net/protocols TCPv6 1880 2 20567 no 208 yes ipv6 y y y y y y y y y y y y y n y y y y y TCP 1696 508 20567 no 208 yes kernel y y y y y y y y y y y y y n y y y y y Using 128KB has no bad effect on the throughput or cpu usage of a single flow, although there is an increase of context switches. A bonus is that we hold socket lock for a shorter amount of time and should improve latencies of ACK processing. lpq83:~# echo -1 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1651584 6291456 16384 20.00 17447.90 10^6bits/s 3.13 S -1.00 U 0.353 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 412,514 context-switches 200.034645535 seconds time elapsed lpq83:~# echo 131072 >/proc/sys/net/ipv4/tcp_notsent_lowat lpq83:~# perf stat -e context-switches ./netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3 OMNI Send TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.7.84 () port 0 AF_INET : +/-2.500% @ 99% conf. Local Remote Local Elapsed Throughput Throughput Local Local Remote Remote Local Remote Service Send Socket Recv Socket Send Time Units CPU CPU CPU CPU Service Service Demand Size Size Size (sec) Util Util Util Util Demand Demand Units Final Final % Method % Method 1593240 6291456 16384 20.00 17321.16 10^6bits/s 3.35 S -1.00 U 0.381 -1.000 usec/KB Performance counter stats for './netperf -H 7.7.7.84 -t omni -l 20 -c -i10,3': 2,675,818 context-switches 200.029651391 seconds time elapsed Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-By: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: ipv6 eliminate parameter "int addrlen" in function fib6_add_1fan.du2013-07-241-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "int addrlen" in fib6_add_1 is rebundant, as we can get it from parameter "struct in6_addr *addr" once we modified its type. And also fix some coding style issues in fib6_add_1 Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net ipv6: Remove rebundant rt6i_nsiblings initializationfan.du2013-07-241-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seting rt->rt6i_nsiblings to zero is rebundant, because above memset zeroed the rest of rt excluding the first dst memember. Signed-off-by: Fan Du <fan.du@windriver.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ip6mr: change the prototype of ip6_mr_forward().Rami Rosen2013-07-241-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the prototpye of the ip6_mr_forward() method to return void instead of int. The ip6_mr_forward() method always returns 0; moreover, the return value of this method is not checked anywhere. Signed-off-by: Rami Rosen <ramirose@gmail.com> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>