summaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAgeFilesLines
* sctp: Send user messages to the lower layer as oneVlad Yasevich2009-09-054-15/+57
| | | | | | | | | | | | | | | | | Currenlty, sctp breaks up user messages into fragments and sends each fragment to the lower layer by itself. This means that for each fragment we go all the way down the stack and back up. This also discourages bundling of multiple fragments when they can fit into a sigle packet (ex: due to user setting a low fragmentation threashold). We introduce a new command SCTP_CMD_SND_MSG and hand the whole message down state machine. The state machine and the side-effect parser will cork the queue, add all chunks from the message to the queue, and then un-cork the queue thus causing the chunks to get transmitted. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: Try to encourage SACK bundling with DATA.Vlad Yasevich2009-09-051-5/+16
| | | | | | | | | If the association has a SACK timer pending and now DATA queued to be send, we'll try to bundle the SACK with the next application send. As such, try encourage bundling by accounting for SACK in the size of the first chunk fragment. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: Generate SACKs when actually sending outbound DATAVlad Yasevich2009-09-051-57/+85
| | | | | | | | | | | | | | | We are now trying to bundle SACKs when we have outbound DATA to send. However, there are situations where this outbound DATA will not be sent (due to congestion or available window). In such cases it's ok to wait for the timer to expire. This patch refactors the sending code so that betfore attempting to bundle the SACK we check to see if the DATA will actually be transmitted. Based on eirlier works for Doug Graham <dgraham@nortel.com> and Wei Youngjun <yjwei@cn.fujitsu.com>. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: Fix data segmentation with small frag_sizeVlad Yasevich2009-09-051-9/+23
| | | | | | | | | | | | | | | Since an application may specify the maximum SCTP fragment size that all data should be fragmented to, we need to fix how we do segmentation. Right now, if a user specifies a small fragment size, the segment size can go negative in the presence of AUTH or COOKIE_ECHO bundling. What we need to do is track the largest possbile DATA chunk that can fit into the mtu. Then if the fragment size specified is bigger then this maximum length, we'll shrink it down. Otherwise, we just use the smaller segment size without changing it further. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: Disallow new connection on a closing socketVlad Yasevich2009-09-052-0/+10
| | | | | | | | | | | If a socket has a lot of association that are in the process of of being closed/aborted, it is possible for a remote to establish new associations during the time period that the old ones are shutting down. If this was a result of a close() call, there will be no socket and will cause a memory leak. We'll prevent this by setting the socket state to CLOSING and disallow new associations when in this state. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: Fix piggybacked ACKsDoug Graham2009-09-051-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch corrects the conditions under which a SACK will be piggybacked on a DATA packet. The previous condition was incorrect due to a misinterpretation of RFC 4960 and/or RFC 2960. Specifically, the following paragraph from section 6.2 had not been implemented correctly: Before an endpoint transmits a DATA chunk, if any received DATA chunks have not been acknowledged (e.g., due to delayed ack), the sender should create a SACK and bundle it with the outbound DATA chunk, as long as the size of the final SCTP packet does not exceed the current MTU. See Section 6.2. When about to send a DATA chunk, the code now checks to see if the SACK timer is running. If it is, we know we have a SACK to send to the peer, so we append the SACK (assuming available space in the packet) and turn off the timer. For a simple request-response scenario, this will result in the SACK being bundled with the response, meaning the the SACK is received quickly by the client, and also meaning that no separate SACK packet needs to be sent by the server to acknowledge the request. Prior to this patch, a separate SACK packet would have been sent by the server SCTP only after its delayed-ACK timer had expired (usually 200ms). This is wasteful of bandwidth, and can also have a major negative impact on performance due the interaction of delayed ACKs with the Nagle algorithm. Signed-off-by: Doug Graham <dgraham@nortel.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: release cached route when the transport goes down.Vlad Yasevich2009-09-051-2/+7
| | | | | | | | | When the sctp transport is marked down, we can release the cached route and force a new lookup when attempting to use this transport for anything. This way, if a better route or source address is available, we'll try to use it. Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: update the route for non-active transports after addresses are addedWei Yongjun2009-09-051-0/+8
| | | | | | | | | Update the route and saddr entries for the non-active transports as some of the added addresses can be used as better source addresses, or may be there is a better route. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: check the unrecognized ASCONF parameter before access itWei Yongjun2009-09-051-3/+5
| | | | | | | | This patch fix to check the unrecognized ASCONF parameter before access it. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* sctp: avoid overwrite the return value of sctp_process_asconf_ack()Wei Yongjun2009-09-051-6/+3Star
| | | | | | | | | The return value of sctp_process_asconf_ack() may be overwritten while process parameters with no error. This patch fixed the problem. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
* ipv6: Fix tcp_v6_send_response(): it didn't set skb transport headerCosmin Ratiu2009-09-041-0/+1
| | | | | | | | | | | | | | | | | | Here is a patch which fixes an issue observed when using TCP over IPv6 and AH from IPsec. When a connection gets closed the 4-way method and the last ACK from the server gets dropped, the subsequent FINs from the client do not get ACKed because tcp_v6_send_response does not set the transport header pointer. This causes ah6_output to try to allocate a lot of memory, which typically fails, so the ACKs never make it out of the stack. I have reproduced the problem on kernel 2.6.7, but after looking at the latest kernel it seems the problem is still there. Signed-off-by: Cosmin Ratiu <cratiu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: adds drops accountingEric Dumazet2009-09-041-7/+22
| | | | | | | | | | | | Its hard to tell if vlans are dropping frames, since every frame given to vlan_???_start_xmit() functions is accounted as fully transmitted by lower device. We can test dev_queue_xmit() return values to properly account for dropped frames. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Remove debugging codeEric Dumazet2009-09-031-2/+0Star
| | | | | | | Remove a debugging aid I accidently left in previous 'cleanup' patch Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: enable multiqueue xmitsEric Dumazet2009-09-031-2/+4
| | | | | | | | | | | | | | | | | vlan_dev_hard_start_xmit() & vlan_dev_hwaccel_hard_start_xmit() select txqueue number 0, instead of using index provided by skb_get_queue_mapping(). This is not correct after commit 2e59af3dcbdf11635c03f [vlan: multiqueue vlan device] because txq->tx_packets & txq->tx_bytes changes are performed on a single location, and not the right locking. Fix is to take the appropriate struct netdev_queue pointer Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: net/core/dev.c cleanupsEric Dumazet2009-09-031-297/+292Star
| | | | | | | Pure style cleanup patch before surgery :) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* atm/br2684: netif_stop_queue() when atm device busy and netif_wake_queue() ↵Karl Hiramoto2009-09-031-10/+27
| | | | | | | | | | | | | when we can send packets again. This patch removes the call to dev_kfree_skb() when the atm device is busy. Calling dev_kfree_skb() causes heavy packet loss then the device is under heavy load, the more correct behavior should be to stop the upper layers, then when the lower device can queue packets again wake the upper layers. Signed-off-by: Karl Hiramoto <karl@hiramoto.org> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: replace hard coded GFP_KERNEL with sk_allocationWu Fengguang2009-09-035-12/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | This fixed a lockdep warning which appeared when doing stress memory tests over NFS: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock mount_root => nfs_root_data => tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim David raised a concern that if the allocation fails in tcp_send_fin(), and it's GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting for the allocation to succeed. But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could loop endlessly under memory pressure. CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: David S. Miller <davem@davemloft.net> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ethtool: Add support for the ethtool feature to flash firmware image ↵Ajit Khaparde2009-09-031-0/+16
| | | | | | | | | | | | | | | | | from a specified file. This patch adds support to flash a firmware image to a device using ethtool. The driver gets the filename of the firmware image and flashes the image using the request firmware path. The region "on the chip" to be flashed can be specified by an option. It is upto the device driver to enumerate the region number passed by ethtool, to the region to be flashed. The default behavior is to flash all the regions on the chip. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ip: Report qdisc packet dropsEric Dumazet2009-09-036-11/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* vlan: multiqueue vlan deviceEric Dumazet2009-09-033-3/+32
| | | | | | | | | | | | | | | vlan devices are currently not multi-queue capable. We can do that with a new rtnl_link_ops method, get_tx_queues(), called from rtnl_create_link() This new method gets num_tx_queues/real_num_tx_queues from real device. register_vlan_device() is also handled. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: drop_monitor: make last_rx timestamp privateNeil Horman2009-09-021-3/+12
| | | | | | | | | | | | | | It was recently pointed out to me that the last_rx field of the net_device structure wasn't updated regularly. In fact only the bonding driver really uses it currently. Since the drop_monitor code relies on the last_rx field to detect drops on recevie in hardware, We need to find a more reliable way to rate limit our drop checks (so that we don't check for drops on every frame recevied, which would be inefficient. This patch makes a last_rx timestamp that is private to the drop monitor code and is updated for every device that we track. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2009-09-023-4/+7
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6
| * cfg80211: fix looping soft lockup in find_ie()Bob Copeland2009-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The find_ie() function uses a size_t for the len parameter, and directly uses len as a loop variable. If any received packets are malformed, it is possible for the decrease of len to overflow, and since the result is unsigned, the loop will not terminate. Change it to a signed int so the loop conditional works for negative values. This fixes the following soft lockup: [38573.102007] BUG: soft lockup - CPU#0 stuck for 61s! [phy0:2230] [38573.102007] Modules linked in: aes_i586 aes_generic fuse af_packet ipt_REJECT xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_state iptable_filter ip_tables x_tables acpi_cpufreq binfmt_misc dm_mirror dm_region_hash dm_log dm_multipath dm_mod kvm_intel kvm uinput i915 arc4 ecb drm snd_hda_codec_idt ath5k snd_hda_intel hid_apple mac80211 usbhid appletouch snd_hda_codec snd_pcm ath cfg80211 snd_timer i2c_algo_bit ohci1394 video snd processor ieee1394 rfkill ehci_hcd sg sky2 backlight snd_page_alloc uhci_hcd joydev output ac thermal button battery sr_mod applesmc cdrom input_polldev evdev unix [last unloaded: scsi_wait_scan] [38573.102007] irq event stamp: 2547724535 [38573.102007] hardirqs last enabled at (2547724534): [<c1002ffc>] restore_all_notrace+0x0/0x18 [38573.102007] hardirqs last disabled at (2547724535): [<c10038f4>] apic_timer_interrupt+0x28/0x34 [38573.102007] softirqs last enabled at (92950144): [<c103ab48>] __do_softirq+0x108/0x210 [38573.102007] softirqs last disabled at (92950274): [<c1348e74>] _spin_lock_bh+0x14/0x80 [38573.102007] [38573.102007] Pid: 2230, comm: phy0 Tainted: G W (2.6.31-rc7-wl #8) MacBook1,1 [38573.102007] EIP: 0060:[<f8ea2d50>] EFLAGS: 00010292 CPU: 0 [38573.102007] EIP is at cmp_ies+0x30/0x180 [cfg80211] [38573.102007] EAX: 00000082 EBX: 00000000 ECX: ffffffc1 EDX: d8efd014 [38573.102007] ESI: ffffff7c EDI: 0000004d EBP: eee2dc50 ESP: eee2dc3c [38573.102007] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 [38573.102007] CR0: 8005003b CR2: d8efd014 CR3: 01694000 CR4: 000026d0 [38573.102007] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000 [38573.102007] DR6: ffff0ff0 DR7: 00000400 [38573.102007] Call Trace: [38573.102007] [<f8ea2f8d>] cmp_bss+0xed/0x100 [cfg80211] [38573.102007] [<f8ea33e4>] cfg80211_bss_update+0x84/0x410 [cfg80211] [38573.102007] [<f8ea3884>] cfg80211_inform_bss_frame+0x114/0x180 [cfg80211] [38573.102007] [<f97255ff>] ieee80211_bss_info_update+0x4f/0x180 [mac80211] [38573.102007] [<f972b118>] ieee80211_rx_bss_info+0x88/0xf0 [mac80211] [38573.102007] [<f9739297>] ? ieee802_11_parse_elems+0x27/0x30 [mac80211] [38573.102007] [<f972b224>] ieee80211_rx_mgmt_probe_resp+0xa4/0x1c0 [mac80211] [38573.102007] [<f972bc59>] ieee80211_sta_rx_queued_mgmt+0x919/0xc50 [mac80211] [38573.102007] [<c1009707>] ? sched_clock+0x27/0xa0 [38573.102007] [<c1009707>] ? sched_clock+0x27/0xa0 [38573.102007] [<c105ffd0>] ? mark_held_locks+0x60/0x80 [38573.102007] [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70 [38573.102007] [<c134baa5>] ? sub_preempt_count+0x85/0xc0 [38573.102007] [<c1348bce>] ? _spin_unlock_irqrestore+0x3e/0x70 [38573.102007] [<c12c1c0f>] ? skb_dequeue+0x4f/0x70 [38573.102007] [<f972c021>] ieee80211_sta_work+0x91/0xb80 [mac80211] [38573.102007] [<c1009707>] ? sched_clock+0x27/0xa0 [38573.102007] [<c134baa5>] ? sub_preempt_count+0x85/0xc0 [38573.102007] [<c10479af>] worker_thread+0x18f/0x320 [38573.102007] [<c104794e>] ? worker_thread+0x12e/0x320 [38573.102007] [<c1348be5>] ? _spin_unlock_irqrestore+0x55/0x70 [38573.102007] [<f972bf90>] ? ieee80211_sta_work+0x0/0xb80 [mac80211] [38573.102007] [<c104cbb0>] ? autoremove_wake_function+0x0/0x50 [38573.102007] [<c1047820>] ? worker_thread+0x0/0x320 [38573.102007] [<c104c854>] kthread+0x84/0x90 [38573.102007] [<c104c7d0>] ? kthread+0x0/0x90 [38573.102007] [<c1003ab7>] kernel_thread_helper+0x7/0x10 Cc: stable@kernel.org Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * wireless: remove mac80211 rate selection extra menuLuis R. Rodriguez2009-09-021-3/+2Star
| | | | | | | | | | | | | | | | | | We can just display this upon enabling mac80211 with an 'if MAC80211 != n' check. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * wireless: update reg debug kconfig entryLuis R. Rodriguez2009-09-021-0/+4
| | | | | | | | | | | | | | Refer to the wireless wiki for more information. Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | net: file_operations should be constStephen Hemminger2009-09-027-14/+14
| | | | | | | | | | | | | | All instances of file_operations should be const. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: inet_connection_sock_af_ops constStephen Hemminger2009-09-024-10/+10
| | | | | | | | | | | | | | | | The function block inet_connect_sock_af_ops contains no data make it constant. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: MD5 operations should be constStephen Hemminger2009-09-022-7/+7
| | | | | | | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: seq_operations should be constStephen Hemminger2009-09-022-2/+2
| | | | | | | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2009-09-0215-28/+44
|\ \ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/yellowfin.c
| * | pkt_sched: Revert tasklet_hrtimer changes.David S. Miller2009-09-022-19/+16Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are full of unresolved problems, mainly that conversions don't work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers tasklets can't be killed from softirq context. And when a qdisc gets reset, that's exactly what we need to do here. We'll work this out in the net-next-2.6 tree and if warranted we'll backport that work to -stable. This reverts the following 3 changesets: a2cb6a4dd470d7a64255a10b843b0d188416b78f ("pkt_sched: Fix bogon in tasklet_hrtimer changes.") 38acce2d7983632100a9ff3fd20295f6e34074a8 ("pkt_sched: Convert CBQ to tasklet_hrtimer.") ee5f9757ea17759e1ce5503bdae2b07e48e32af9 ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer") Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: sk_free() should be allowed right after sk_alloc()Jarek Poplawski2009-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) sk_free() frees socks conditionally and depends on sk_wmem_alloc being set e.g. in sock_init_data(). But in some cases sk_free() is called earlier, usually after other alloc errors. Fix is to move sk_wmem_alloc initialization from sock_init_data() to sk_alloc() itself. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pkt_sched: Fix bogon in tasklet_hrtimer changes.David S. Miller2009-08-252-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reported by Stephen Rothwell, luckily it's harmless: net/sched/sch_api.c: In function 'qdisc_watchdog': net/sched/sch_api.c:460: warning: initialization from incompatible pointer type net/sched/sch_cbq.c: In function 'cbq_undelay': net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type Signed-off-by: David S. Miller <davem@davemloft.net>
| * | NET: llc, zero sockaddr_llc structJiri Slaby2009-08-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | sllc_arphrd member of sockaddr_llc might not be changed. Zero sllc before copying to the above layer's structure. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netpoll: warning for ndo_start_xmit returns with interrupts enabledDongdong Deng2009-08-241-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | WARN_ONCE for ndo_start_xmit() enable interrupts in netpoll_send_skb(), because the NETPOLL API requires that interrupts remain disabled in netpoll_send_skb(). Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com> Acked-by: Matt Mackall <mpm@selenic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: xt_quota: fix wrong return value (error case)Patrick McHardy2009-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Success was indicated on a memory allocation failure, thereby causing a crash due to a later NULL deref. (Affects v2.6.30-rc1 up to here.) Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv6: Fix commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make ↵Bruno Prémont2009-08-241-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | v4-mapped bindings consistent with IPv4) Commit 63d9950b08184e6531adceb65f64b429909cc101 (ipv6: Make v4-mapped bindings consistent with IPv4) changes behavior of inet6_bind() for v4-mapped addresses so it should behave the same way as inet_bind(). During this change setting of err to -EADDRNOTAVAIL got lost: af_inet.c:469 inet_bind() err = -EADDRNOTAVAIL; if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && addr->sin_addr.s_addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; af_inet6.c:463 inet6_bind() if (addr_type == IPV6_ADDR_MAPPED) { int chk_addr_ret; /* Binding to v4-mapped address on a v6-only socket * makes no sense */ if (np->ipv6only) { err = -EINVAL; goto out; } /* Reproduce AF_INET checks to make the bindings consitant */ v4addr = addr->sin6_addr.s6_addr32[3]; chk_addr_ret = inet_addr_type(net, v4addr); if (!sysctl_ip_nonlocal_bind && !(inet->freebind || inet->transparent) && v4addr != htonl(INADDR_ANY) && chk_addr_ret != RTN_LOCAL && chk_addr_ret != RTN_MULTICAST && chk_addr_ret != RTN_BROADCAST) goto out; } else { Signed-off-by Bruno Prémont <bonbons@linux-vserver.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | pkt_sched: Convert CBQ to tasklet_hrtimer.David S. Miller2009-08-241-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | This code expects to run in softirq context, and bare hrtimers run in hw IRQ context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>
| * | pkt_sched: Convert qdisc_watchdog to tasklet_hrtimerDavid S. Miller2009-08-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | None of this stuff should execute in hw IRQ context, therefore use a tasklet_hrtimer so that it runs in softirq context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>
| * | Merge branch 'master' of ↵David S. Miller2009-08-191-13/+15
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
| | * | mac80211: fix todo lockJohannes Berg2009-08-171-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The key todo lock can be taken from different locks that require it to be _bh to avoid lock inversion due to (soft)irqs. This should fix the two problems reported by Bob and Gabor: http://mid.gmane.org/20090619113049.GB18956@hash.localnet http://mid.gmane.org/4A3FA376.8020307@openwrt.org Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: Bob Copeland <me@bobcopeland.com> Cc: Gabor Juhos <juhosg@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>
| * | | net: restore gnet_stats_basic to previous definitionEric Dumazet2009-08-188-15/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 5e140dfc1fe87eae27846f193086724806b33c7d "net: reorder struct Qdisc for better SMP performance" the definition of struct gnet_stats_basic changed incompatibly, as copies of this struct are shipped to userland via netlink. Restoring old behavior is not welcome, for performance reason. Fix is to use a private structure for kernel, and teach gnet_stats_copy_basic() to convert from kernel to user land, using legacy structure (struct gnet_stats_basic) Based on a report and initial patch from Michael Spang. Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | NETROM: Fix use of static bufferRalf Baechle2009-08-181-9/+12
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | The static variable used by nr_call_to_digi might result in corruption if multiple threads are trying to usee a node or neighbour via ioctl. Fixed by having the caller pass a structure in. This is safe because nr_add_node rsp. nr_add_neigh will allocate a permanent structure, if needed. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | gre: Fix MTU calculation for bound GRE tunnelsTom Goff2009-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The GRE header length should be subtracted when the tunnel MTU is calculated. This just corrects for the associativity change introduced by commit 42aa916265d740d66ac1f17290366e9494c884c2 ("gre: Move MTU setting out of ipgre_tunnel_bind_dev"). Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDSEric Dumazet2009-09-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | drop_monitor: fix trace_napi_poll_hit()Xiao Guangrong2009-09-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The net_dev of backlog napi is NULL, like below: __get_cpu_var(softnet_data).backlog.dev == NULL So, we should check it in napi tracepoint's probe function Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: make neigh_ops constantStephen Hemminger2009-09-024-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | These tables are never modified at runtime. Move to read-only section. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netns: embed ip6_dst_ops directlyAlexey Dobriyan2009-09-021-21/+13Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated, but there is no fundamental reason for it. Embed it directly into struct netns_ipv6. For that: * move struct dst_ops into separate header to fix circular dependencies I honestly tried not to, it's pretty impossible to do other way * drop dynamical allocation, allocate together with netns For a change, remove struct dst_ops::dst_net, it's deducible by using container_of() given dst_ops pointer. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.Damian Lukowski2009-09-011-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 1122 specifies two threshold values R1 and R2 for connection timeouts, which may represent a number of allowed retransmissions or a timeout value. Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds in number of allowed retransmissions. For any desired threshold R2 (by means of time) one can specify tcp_retries2 (by means of number of retransmissions) such that TCP will not time out earlier than R2. This is the case, because the RTO schedule follows a fixed pattern, namely exponential backoff. However, the RTO behaviour is not predictable any more if RTO backoffs can be reverted, as it is the case in the draft "Make TCP more Robust to Long Connectivity Disruptions" (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd). In the worst case TCP would time out a connection after 3.2 seconds, if the initial RTO equaled MIN_RTO and each backoff has been reverted. This patch introduces a function retransmits_timed_out(N), which calculates the timeout of a TCP connection, assuming an initial RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions. Whenever timeout decisions are made by comparing the retransmission counter to some value N, this function can be used, instead. The meaning of tcp_retries2 will be changed, as many more RTO retransmissions can occur than the value indicates. However, it yields a timeout which is similar to the one of an unpatched, exponentially backing off TCP in the same scenario. As no application could rely on an RTO greater than MIN_RTO, there should be no risk of a regression. Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Revert Backoff [v3]: Revert RTO on ICMP destination unreachableDamian Lukowski2009-09-013-4/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here, an ICMP host/network unreachable message, whose payload fits to TCP's SND.UNA, is taken as an indication that the RTO retransmission has not been lost due to congestion, but because of a route failure somewhere along the path. With true congestion, a router won't trigger such a message and the patched TCP will operate as standard TCP. This patch reverts one RTO backoff, if an ICMP host/network unreachable message, whose payload fits to TCP's SND.UNA, arrives. Based on the new RTO, the retransmission timer is reset to reflect the remaining time, or - if the revert clocked out the timer - a retransmission is sent out immediately. Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if there have been retransmissions and reversible backoffs, already. Changes from v2: 1) Renaming of skb in tcp_v4_err() moved to another patch. 2) Reintroduced tcp_bound_rto() and __tcp_set_rto(). 3) Fixed code comments. Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>