summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sched: Optimize branch hint in pick_next_task_fair()Tim Blechmann2009-11-241-1/+1
| | | | | | | | | | | | | | | | | Branch hint profiling on my nehalem machine showed 90% incorrect branch hints: 15728471 158903754 90 pick_next_task_fair sched_fair.c 1555 Signed-off-by: Tim Blechmann <tim@klingt.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4B0BBBB1.2050100@klingt.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched_feat_write(): Update ppos instead of file->f_posJan Blunck2009-11-231-1/+1
| | | | | | | | | | | | | | | | | sched_feat_write() should update ppos instead of file->f_pos. (This reduces some BKL dependencies of this code.) Signed-off-by: Jan Blunck <jblunck@suse.de> Cc: jkacur@redhat.com Cc: Arnd Bergmann <arnd@arndb.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> LKML-Reference: <1258735245-25826-8-git-send-email-jblunck@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: More generic WAKE_AFFINE vs select_idle_sibling()Peter Zijlstra2009-11-131-17/+16Star
| | | | | | | | | | | Instead of only considering SD_WAKE_AFFINE | SD_PREFER_SIBLING domains also allow all SD_PREFER_SIBLING domains below a SD_WAKE_AFFINE domain to change the affinity target. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091112145610.909723612@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Cleanup select_task_rq_fair()Peter Zijlstra2009-11-131-22/+51
| | | | | | | | | | Clean up the new affine to idle sibling bits while trying to grok them. Should not have any function differences. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <20091112145610.832503781@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Fix granularity of task_u/stime()Hidetoshi Seto2009-11-121-9/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Originally task_s/utime() were designed to return clock_t but later changed to return cputime_t by following commit: commit efe567fc8281661524ffa75477a7c4ca9b466c63 Author: Christian Borntraeger <borntraeger@de.ibm.com> Date: Thu Aug 23 15:18:02 2007 +0200 It only changed the type of return value, but not the implementation. As the result the granularity of task_s/utime() is still that of clock_t, not that of cputime_t. So using task_s/utime() in __exit_signal() makes values accumulated to the signal struct to be rounded and coarse grained. This patch removes casts to clock_t in task_u/stime(), to keep granularity of cputime_t over the calculation. v2: Use div_u64() to avoid error "undefined reference to `__udivdi3`" on some 32bit systems. Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: xiyou.wangcong@gmail.com Cc: Spencer Candland <spencer@bluehost.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Stanislaw Gruszka <sgruszka@redhat.com> LKML-Reference: <4AFB9029.9000208@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Make sure task has correct sched_class after policy changePeter Zijlstra2009-11-101-12/+4Star
| | | | | | | | | | | | | | | | From the code in rt_mutex_setprio(), it is evident that the intention is that task's with a RT 'prio' value as a consequence of receiving a PI boost also have their 'sched_class' field set to '&rt_sched_class'. However, Peter noticed that the code in __setscheduler() could result in this intention being frustrated. Fix it. Reported-by: Peter Williams <pwil3058@bigpond.net.au> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <1257880321.4108.457.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Fix and clean up rate-limit newidle codeMike Galbraith2009-11-101-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | Commit 1b9508f, "Rate-limit newidle" has been confirmed to fix the netperf UDP loopback regression reported by Alex Shi. This is a cleanup and a fix: - moved to a more out of the way spot - fix to ensure that balancing doesn't try to balance runqueues which haven't gone online yet, which can mess up CPU enumeration during boot. Reported-by: Alex Shi <alex.shi@intel.com> Reported-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> # .32.x: a1f84a3: sched: Check for an idle shared cache Cc: <stable@kernel.org> # .32.x: 1b9508f: sched: Rate-limit newidle Cc: <stable@kernel.org> # .32.x: fd21073: sched: Fix affinity logic Cc: <stable@kernel.org> # .32.x LKML-Reference: <1257821402.5648.17.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched, no_hz: Remove unused rq->last_tick_seen fieldLai Jiangshan2009-11-081-1/+0Star
| | | | | | | | | | | In 15934a37324f32e0fda633dc7984a671ea81cd75, field last_tick_seen is added to struct rq. But it is unused now. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Guillaume Chazarain <guichaz@yahoo.fr> LKML-Reference: <4AE6A513.6010100@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Fix affinity logic in select_task_rq_fair()Mike Galbraith2009-11-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ingo Molnar reported: [ 26.804000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10 [ 26.808000] caller is vmstat_update+0x26/0x70 [ 26.812000] Pid: 10, comm: events/1 Not tainted 2.6.32-rc5 #6887 [ 26.816000] Call Trace: [ 26.820000] [<c1924a24>] ? printk+0x28/0x3c [ 26.824000] [<c13258a0>] debug_smp_processor_id+0xf0/0x110 [ 26.824000] mount used greatest stack depth: 1464 bytes left [ 26.828000] [<c111d086>] vmstat_update+0x26/0x70 [ 26.832000] [<c1086418>] worker_thread+0x188/0x310 [ 26.836000] [<c10863b7>] ? worker_thread+0x127/0x310 [ 26.840000] [<c108d310>] ? autoremove_wake_function+0x0/0x60 [ 26.844000] [<c1086290>] ? worker_thread+0x0/0x310 [ 26.848000] [<c108cf0c>] kthread+0x7c/0x90 [ 26.852000] [<c108ce90>] ? kthread+0x0/0x90 [ 26.856000] [<c100c0a7>] kernel_thread_helper+0x7/0x10 [ 26.860000] BUG: using smp_processor_id() in preemptible [00000000] code: events/1/10 [ 26.864000] caller is vmstat_update+0x3c/0x70 Because this commit: a1f84a3: sched: Check for an idle shared cache in select_task_rq_fair() broke ->cpus_allowed. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: arjan@infradead.org Cc: <stable@kernel.org> LKML-Reference: <1257415066.12867.1.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Rate-limit newidleMike Galbraith2009-11-042-1/+25
| | | | | | | | | | Rate limit newidle to migration_cost. It's a win for all stages of sysbench oltp tests. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Check for an idle shared cache in select_task_rq_fair()Mike Galbraith2009-11-041-4/+29
| | | | | | | | | | | | | | | | When waking affine, check for an idle shared cache, and if found, wake to that CPU/sibling instead of the waker's CPU. This improves pgsql+oltp ramp up by roughly 8%. Possibly more for other loads, depending on overlap. The trade-off is a roughly 1% peak downturn if tasks are truly synchronous. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: <stable@kernel.org> LKML-Reference: <1256654138.17752.7.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: Partition_sched_domains takes array of cpumask_var_tRusty Russell2009-11-043-34/+61
| | | | | | | | | | | | | | | | | | Currently partition_sched_domains() takes a 'struct cpumask *doms_new' which is a kmalloc'ed array of cpumask_t. You can't have such an array if 'struct cpumask' is undefined, as we plan for CONFIG_CPUMASK_OFFSTACK=y. So, we make this an array of cpumask_var_t instead: this is the same for the CONFIG_CPUMASK_OFFSTACK=n case, but requires multiple allocations for the CONFIG_CPUMASK_OFFSTACK=y case. Hence we add alloc_sched_domains() and free_sched_domains() functions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <200911031453.40668.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: Simplify sched_rt.cRusty Russell2009-11-041-37/+24Star
| | | | | | | | | | | | | | find_lowest_rq() wants to call pick_optimal_cpu() on the intersection of sched_domain_span(sd) and lowest_mask. Rather than doing a cpus_and into a temporary, we can open-code it. This actually makes the code slightly clearer, IMHO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <200911031453.15350.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Add USER_SCHED to feature removal listDhaval Giani2009-11-041-0/+15
| | | | | | | | | | | | | | | | | | | | | | | Peter Zijlstra suggested that we remove USER_SCHED at: http://lkml.org/lkml/2009/3/21/67 Removing USER_SCHED removes a lot of code from the scheduler and simplifies the code. We already have the ability to do user based classification which is tightened using PAM in userspace. Schedule USER_SCHED for removal in 2.6.34 Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> LKML-Reference: <20091103214544.GI5495@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Remove unused cpu_nr_migrations()Hiroshi Shimamoto2009-11-042-12/+0Star
| | | | | | | | | cpu_nr_migrations() is not used, remove it. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4AF12A66.6020609@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Remove unused time_sync_thresh declarationHiroshi Shimamoto2009-11-041-2/+0Star
| | | | | | | | | time_sync_thresh had been removed. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4AF12A3A.5050200@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Remove unused __schedule() declarationHiroshi Shimamoto2009-11-042-2/+1Star
| | | | | | | | | __schedule() had been removed. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4AF129C8.3030008@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched, cpuacct: Fix niced guest time accountingRyota Ozaki2009-10-254-9/+23
| | | | | | | | | | | | | | | | | | | | | | | | | CPU time of a guest is always accounted in 'user' time without concern for the nice value of its counterpart process although the guest is scheduled under the nice value. This patch fixes the defect and accounts cpu time of a niced guest in 'nice' time as same as a niced process. And also the patch adds 'guest_nice' to cpuacct. The value provides niced guest cpu time which is like 'nice' to 'user'. The original discussions can be found here: http://www.mail-archive.com/kvm@vger.kernel.org/msg23982.html http://www.mail-archive.com/kvm@vger.kernel.org/msg23860.html Signed-off-by: Ryota Ozaki <ozaki.ryota@gmail.com> Acked-by: Avi Kivity <avi@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1256314810-7897-1-git-send-email-ozaki.ryota@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'linus' into sched/coreIngo Molnar2009-10-255200-140121/+413844
|\ | | | | | | | | | | | | | | | | Conflicts: fs/proc/array.c Merge reason: resolve conflict and queue up dependent patch. Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linusLinus Torvalds2009-10-2314-48/+12Star
| |\ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: move virtrng_remove to .devexit.text move virtballoon_remove to .devexit.text virtio_blk: Revert serial number support virtio: let header files include virtio_ids.h virtio_blk: revert QUEUE_FLAG_VIRT addition
| | * move virtrng_remove to .devexit.textUwe Kleine-König2009-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function virtrng_remove is used only wrapped by __devexit_p so define it using __devexit. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S. Tsirkin <mst@redhat.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * move virtballoon_remove to .devexit.textUwe Kleine-König2009-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function virtballoon_remove is used only wrapped by __devexit_p so define it using __devexit. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Sam Ravnborg <sam@ravnborg.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * virtio_blk: Revert serial number supportRusty Russell2009-10-222-38/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts "Add serial number support for virtio_blk, V4a". Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit on virtio config space, so noone could ever use this. This is coming back later in a cleaner form. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: john cooper <john.cooper@redhat.com> Cc: Jens Axboe <jens.axboe@oracle.com>
| | * virtio: let header files include virtio_ids.hChristian Borntraeger2009-10-2214-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rusty, commit 3ca4f5ca73057a617f9444a91022d7127041970a virtio: add virtio IDs file moved all device IDs into a single file. While the change itself is a very good one, it can break userspace applications. For example if a userspace tool wanted to get the ID of virtio_net it used to include virtio_net.h. This does no longer work, since virtio_net.h does not include virtio_ids.h. This patch moves all "#include <linux/virtio_ids.h>" from the C files into the header files, making the header files compatible with the old ones. In addition, this patch exports virtio_ids.h to userspace. CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * virtio_blk: revert QUEUE_FLAG_VIRT additionChristoph Hellwig2009-10-221-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems like the addition of QUEUE_FLAG_VIRT caueses major performance regressions for Fedora users: https://bugzilla.redhat.com/show_bug.cgi?id=509383 https://bugzilla.redhat.com/show_bug.cgi?id=505695 while I can't reproduce those extreme regressions myself I think the flag is wrong. Rationale: QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue unplugged immediately. This is not a good behaviour for at least qemu and kvm where we do have significant overhead for every I/O operations. Even with all the latested speeups (native AIO, MSI support, zero copy) we can only get native speed for up to 128kb I/O requests we already are down to 66% of native performance for 4kb requests even on my laptop running the Intel X25-M SSD for which the QUEUE_FLAG_NONROT was designed. If we ever get virtio-blk overhead low enough that this flag makes sense it should only be set based on a feature flag set by the host. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| * | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-10-2320-86/+212
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits) niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST KS8851: Fix MAC address write order KS8851: Add soft reset at probe time net: fix section mismatch in fec.c net: Fix struct inet_timewait_sock bitfield annotation tcp: Try to catch MSG_PEEK bug net: Fix IP_MULTICAST_IF bluetooth: static lock key fix bluetooth: scheduling while atomic bug fix tcp: fix TCP_DEFER_ACCEPT retrans calculation tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT tcp: accept socket after TCP_DEFER_ACCEPT period Revert "tcp: fix tcp_defer_accept to consider the timeout" AF_UNIX: Fix deadlock on connecting to shutdown socket ethoc: clear only pending irqs ethoc: inline regs access vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs() be2net: fix support for PCI hot plug ...
| | * | niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was ↵Joyce Yu2009-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | copied to the head buffer in the Vlan packets case Signed-off-by: Joyce Yu <joyce.yu@sun.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICASTBen Dooks2009-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ks8851_set_rx_mode() the case handling IFF_MULTICAST was also setting the RXCR1_AE bit by accident. This meant that all unicast frames where being accepted by the device. Remove RXCR1_AE from this case. Note, RXCR1_AE was also masking a problem with setting the MAC address properly, so needs to be applied after fixing the MAC write order. Fixes a bug reported by Doong, Ping of Micrel. This version of the patch avoids setting RXCR1_ME for all cases. Signed-off-by: Ben Dooks <ben@simtec.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | KS8851: Fix MAC address write orderBen Dooks2009-10-212-4/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MAC address register was being written in the wrong order, so add a new address macro to convert mac-address byte to register address and a ks8851_wrreg8() function to write each byte without having to worry about any difficult byte swapping. Fixes a bug reported by Doong, Ping of Micrel. Signed-off-by: Ben Dooks <ben@simtec.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | KS8851: Add soft reset at probe timeBen Dooks2009-10-211-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue a full soft reset at probe time. This was reported by Doong Ping of Micrel, but no explanation of why this is necessary or what bug it is fixing. Add it as it does not seem to hurt the current driver and ensures that the device is in a known state when we start setting it up. Signed-off-by: Ben Dooks <ben@simtec.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: fix section mismatch in fec.cSteven King2009-10-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fec_enet_init is called by both fec_probe and fec_resume, so it shouldn't be marked as __init. Signed-off-by: Steven King <sfking@fdwdc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Fix struct inet_timewait_sock bitfield annotationEric Dumazet2009-10-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9e337b0f (net: annotate inet_timewait_sock bitfields) added 4/8 bytes in struct inet_timewait_sock. Fix this by declaring tw_ipv6_offset in the 'flags' bitfield The 14 bits hole is named tw_pad to make it cleary apparent. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tcp: Try to catch MSG_PEEK bugHerbert Xu2009-10-201-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch tries to print out more information when we hit the MSG_PEEK bug in tcp_recvmsg. It's been around since at least 2005 and it's about time that we finally fix it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | net: Fix IP_MULTICAST_IFEric Dumazet2009-10-202-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ipv4/ipv6 setsockopt(IP_MULTICAST_IF) have dubious __dev_get_by_index() calls. This function should be called only with RTNL or dev_base_lock held, or reader could see a corrupt hash chain and eventually enter an endless loop. Fix is to call dev_get_by_index()/dev_put(). If this happens to be performance critical, we could define a new dev_exist_by_index() function to avoid touching dev refcount. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | bluetooth: static lock key fixDave Young2009-10-201-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When shutdown ppp connection, lockdep waring about non-static key will happen, it is caused by the lock is not initialized properly at that time. Fix with tuning the lock/skb_queue_head init order [ 94.339261] INFO: trying to register non-static key. [ 94.342509] the code is fine but needs lockdep annotation. [ 94.342509] turning off the locking correctness validator. [ 94.342509] Pid: 0, comm: swapper Not tainted 2.6.31-mm1 #2 [ 94.342509] Call Trace: [ 94.342509] [<c0248fbe>] register_lock_class+0x58/0x241 [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c024ab34>] __lock_acquire+0xac/0xb73 [ 94.342509] [<c024b7fa>] ? lock_release_non_nested+0x17b/0x1de [ 94.342509] [<c024b662>] lock_acquire+0x67/0x84 [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c054a857>] _spin_lock_irqsave+0x2f/0x3f [ 94.342509] [<c04cd1eb>] ? skb_dequeue+0x15/0x41 [ 94.342509] [<c04cd1eb>] skb_dequeue+0x15/0x41 [ 94.342509] [<c054a648>] ? _read_unlock+0x1d/0x20 [ 94.342509] [<c04cd641>] skb_queue_purge+0x14/0x1b [ 94.342509] [<fab94fdc>] l2cap_recv_frame+0xea1/0x115a [l2cap] [ 94.342509] [<c024b5df>] ? __lock_acquire+0xb57/0xb73 [ 94.342509] [<c0249c04>] ? mark_lock+0x1e/0x1c7 [ 94.342509] [<f8364963>] ? hci_rx_task+0xd2/0x1bc [bluetooth] [ 94.342509] [<fab95346>] l2cap_recv_acldata+0xb1/0x1c6 [l2cap] [ 94.342509] [<f8364997>] hci_rx_task+0x106/0x1bc [bluetooth] [ 94.342509] [<fab95295>] ? l2cap_recv_acldata+0x0/0x1c6 [l2cap] [ 94.342509] [<c02302c4>] tasklet_action+0x69/0xc1 [ 94.342509] [<c022fbef>] __do_softirq+0x94/0x11e [ 94.342509] [<c022fcaf>] do_softirq+0x36/0x5a [ 94.342509] [<c022fe14>] irq_exit+0x35/0x68 [ 94.342509] [<c0204ced>] do_IRQ+0x72/0x89 [ 94.342509] [<c02038ee>] common_interrupt+0x2e/0x34 [ 94.342509] [<c024007b>] ? pm_qos_add_requirement+0x63/0x9d [ 94.342509] [<c038e8a5>] ? acpi_idle_enter_bm+0x209/0x238 [ 94.342509] [<c049d238>] cpuidle_idle_call+0x5c/0x94 [ 94.342509] [<c02023f8>] cpu_idle+0x4e/0x6f [ 94.342509] [<c0534153>] rest_init+0x53/0x55 [ 94.342509] [<c0781894>] start_kernel+0x2f0/0x2f5 [ 94.342509] [<c0781091>] i386_start_kernel+0x91/0x96 Reported-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | bluetooth: scheduling while atomic bug fixDave Young2009-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to driver core changes dev_set_drvdata will call kzalloc which should be in might_sleep context, but hci_conn_add will be called in atomic context Like dev_set_name move dev_set_drvdata to work queue function. oops as following: Oct 2 17:41:59 darkstar kernel: [ 438.001341] BUG: sleeping function called from invalid context at mm/slqb.c:1546 Oct 2 17:41:59 darkstar kernel: [ 438.001345] in_atomic(): 1, irqs_disabled(): 0, pid: 2133, name: sdptool Oct 2 17:41:59 darkstar kernel: [ 438.001348] 2 locks held by sdptool/2133: Oct 2 17:41:59 darkstar kernel: [ 438.001350] #0: (sk_lock-AF_BLUETOOTH-BTPROTO_L2CAP){+.+.+.}, at: [<faa1d2f5>] lock_sock+0xa/0xc [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001360] #1: (&hdev->lock){+.-.+.}, at: [<faa20e16>] l2cap_sock_connect+0x103/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001371] Pid: 2133, comm: sdptool Not tainted 2.6.31-mm1 #2 Oct 2 17:41:59 darkstar kernel: [ 438.001373] Call Trace: Oct 2 17:41:59 darkstar kernel: [ 438.001381] [<c022433f>] __might_sleep+0xde/0xe5 Oct 2 17:41:59 darkstar kernel: [ 438.001386] [<c0298843>] __kmalloc+0x4a/0x15a Oct 2 17:41:59 darkstar kernel: [ 438.001392] [<c03f0065>] ? kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001396] [<c03f0065>] kzalloc+0xb/0xd Oct 2 17:41:59 darkstar kernel: [ 438.001400] [<c03f04ff>] device_private_init+0x15/0x3d Oct 2 17:41:59 darkstar kernel: [ 438.001405] [<c03f24c5>] dev_set_drvdata+0x18/0x26 Oct 2 17:41:59 darkstar kernel: [ 438.001414] [<fa51fff7>] hci_conn_init_sysfs+0x40/0xd9 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001422] [<fa51cdc0>] ? hci_conn_add+0x128/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001429] [<fa51ce0f>] hci_conn_add+0x177/0x186 [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001437] [<fa51cf8a>] hci_connect+0x3c/0xfb [bluetooth] Oct 2 17:41:59 darkstar kernel: [ 438.001442] [<faa20e87>] l2cap_sock_connect+0x174/0x26b [l2cap] Oct 2 17:41:59 darkstar kernel: [ 438.001448] [<c04c8df5>] sys_connect+0x60/0x7a Oct 2 17:41:59 darkstar kernel: [ 438.001453] [<c024b703>] ? lock_release_non_nested+0x84/0x1de Oct 2 17:41:59 darkstar kernel: [ 438.001458] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001462] [<c028804b>] ? might_fault+0x47/0x81 Oct 2 17:41:59 darkstar kernel: [ 438.001468] [<c033361f>] ? __copy_from_user_ll+0x11/0xce Oct 2 17:41:59 darkstar kernel: [ 438.001472] [<c04c9419>] sys_socketcall+0x82/0x17b Oct 2 17:41:59 darkstar kernel: [ 438.001477] [<c020329d>] syscall_call+0x7/0xb Signed-off-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tcp: fix TCP_DEFER_ACCEPT retrans calculationJulian Anastasov2009-10-201-12/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix TCP_DEFER_ACCEPT conversion between seconds and retransmission to match the TCP SYN-ACK retransmission periods because the time is converted to such retransmissions. The old algorithm selects one more retransmission in some cases. Allow up to 255 retransmissions. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPTJulian Anastasov2009-10-201-3/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change SYN-ACK retransmitting code for the TCP_DEFER_ACCEPT users to not retransmit SYN-ACKs during the deferring period if ACK from client was received. The goal is to reduce traffic during the deferring period. When the period is finished we continue with sending SYN-ACKs (at least one) but this time any traffic from client will change the request to established socket allowing application to terminate it properly. Also, do not drop acked request if sending of SYN-ACK fails. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | tcp: accept socket after TCP_DEFER_ACCEPT periodJulian Anastasov2009-10-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Willy Tarreau and many other folks in recent years were concerned what happens when the TCP_DEFER_ACCEPT period expires for clients which sent ACK packet. They prefer clients that actively resend ACK on our SYN-ACK retransmissions to be converted from open requests to sockets and queued to the listener for accepting after the deferring period is finished. Then application server can decide to wait longer for data or to properly terminate the connection with FIN if read() returns EAGAIN which is an indication for accepting after the deferring period. This change still can have side effects for applications that expect always to see data on the accepted socket. Others can be prepared to work in both modes (with or without TCP_DEFER_ACCEPT period) and their data processing can ignore the read=EAGAIN notification and to allocate resources for clients which proved to have no data to send during the deferring period. OTOH, servers that use TCP_DEFER_ACCEPT=1 as flag (not as a timeout) to wait for data will notice clients that didn't send data for 3 seconds but that still resend ACKs. Thanks to Willy Tarreau for the initial idea and to Eric Dumazet for the review and testing the change. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | Revert "tcp: fix tcp_defer_accept to consider the timeout"David S. Miller2009-10-201-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6d01a026b7d3009a418326bdcf313503a314f1ea. Julian Anastasov, Willy Tarreau and Eric Dumazet have come up with a more correct way to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | AF_UNIX: Fix deadlock on connecting to shutdown socketTomoki Sekiyama2009-10-191-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a deadlock bug in UNIX domain socket, which makes able to DoS attack against the local machine by non-root users. How to reproduce: 1. Make a listening AF_UNIX/SOCK_STREAM socket with an abstruct namespace(*), and shutdown(2) it. 2. Repeat connect(2)ing to the listening socket from the other sockets until the connection backlog is full-filled. 3. connect(2) takes the CPU forever. If every core is taken, the system hangs. PoC code: (Run as many times as cores on SMP machines.) int main(void) { int ret; int csd; int lsd; struct sockaddr_un sun; /* make an abstruct name address (*) */ memset(&sun, 0, sizeof(sun)); sun.sun_family = PF_UNIX; sprintf(&sun.sun_path[1], "%d", getpid()); /* create the listening socket and shutdown */ lsd = socket(AF_UNIX, SOCK_STREAM, 0); bind(lsd, (struct sockaddr *)&sun, sizeof(sun)); listen(lsd, 1); shutdown(lsd, SHUT_RDWR); /* connect loop */ alarm(15); /* forcely exit the loop after 15 sec */ for (;;) { csd = socket(AF_UNIX, SOCK_STREAM, 0); ret = connect(csd, (struct sockaddr *)&sun, sizeof(sun)); if (-1 == ret) { perror("connect()"); break; } puts("Connection OK"); } return 0; } (*) Make sun_path[0] = 0 to use the abstruct namespace. If a file-based socket is used, the system doesn't deadlock because of context switches in the file system layer. Why this happens: Error checks between unix_socket_connect() and unix_wait_for_peer() are inconsistent. The former calls the latter to wait until the backlog is processed. Despite the latter returns without doing anything when the socket is shutdown, the former doesn't check the shutdown state and just retries calling the latter forever. Patch: The patch below adds shutdown check into unix_socket_connect(), so connect(2) to the shutdown socket will return -ECONREFUSED. Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com> Signed-off-by: Masanori Yoshida <masanori.yoshida.tv@hitachi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | ethoc: clear only pending irqsThomas Chou2009-10-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixed the problem of dropped packets due to lost of interrupt requests. We should only clear what was pending at the moment we read the irq source reg. Signed-off-by: Thomas Chou <thomas@wytron.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | ethoc: inline regs accessThomas Chou2009-10-191-9/+10
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Thomas Chou <thomas@wytron.com.tw> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=nRandy Dunlap2009-10-172-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmxnet3 was using dprintk() for debugging output. This was defined in <linux/dst.h> and was the only thing that was used from that header file. This caused compile errors when CONFIG_BLOCK was not enabled due to bio* and BIO* uses in the header file, so change this driver to use dev_dbg() for debugging output. include/linux/dst.h:520: error: dereferencing pointer to incomplete type include/linux/dst.h:520: error: 'BIO_POOL_BITS' undeclared (first use in this function) include/linux/dst.h:521: error: dereferencing pointer to incomplete type include/linux/dst.h:522: error: dereferencing pointer to incomplete type include/linux/dst.h:525: error: dereferencing pointer to incomplete type make[4]: *** [drivers/net/vmxnet3/vmxnet3_drv.o] Error 1 Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Bhavesh Davda <bhavesh@vmware.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()Eric Dumazet2009-10-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because netpoll can call netdevice start_xmit() method with irqs disabled, drivers should not call kfree_skb() from their start_xmit(), but use dev_kfree_skb_any() instead. Oct 8 11:16:52 172.30.1.31 [113074.791813] ------------[ cut here ]------------ Oct 8 11:16:52 172.30.1.31 [113074.791813] WARNING: at net/core/skbuff.c:398 \ skb_release_head_state+0x64/0xc8() Oct 8 11:16:52 172.30.1.31 [113074.791813] Hardware name: Oct 8 11:16:52 172.30.1.31 [113074.791813] Modules linked in: netconsole ocfs2 jbd2 quota_tree \ ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs crc32c drbd cn loop \ serio_raw psmouse snd_pcm snd_timer snd soundcore snd_page_alloc virtio_net pcspkr parport_pc parport \ i2c_piix4 i2c_core button processor evdev ext3 jbd mbcache dm_mirror dm_region_hash dm_log dm_snapshot \ dm_mod ide_cd_mod cdrom ata_generic ata_piix virtio_blk libata scsi_mod piix ide_pci_generic ide_core \ virtio_pci virtio_ring virtio floppy thermal fan thermal_sys [last unloaded: netconsole] Oct 8 11:16:52 172.30.1.31 [113074.791813] Pid: 11132, comm: php5-cgi Tainted: G W \ 2.6.31.2-vserver #1 Oct 8 11:16:52 172.30.1.31 [113074.791813] Call Trace: Oct 8 11:16:52 172.30.1.31 [113074.791813] <IRQ> [<ffffffff81253cd5>] ? \ skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049ae1>] ? warn_slowpath_common+0x77/0xa3 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253cd5>] ? skb_release_head_state+0x64/0xc8 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81253a1a>] ? __kfree_skb+0x9/0x7d Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cb139>] ? free_old_xmit_skbs+0x51/0x6e \ [virtio_net] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa01cbc85>] ? start_xmit+0x26/0xf2 [virtio_net] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8126934f>] ? netpoll_send_skb+0xd2/0x205 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffffa0429216>] ? write_msg+0x90/0xeb [netconsole] Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81049f06>] ? __call_console_drivers+0x5e/0x6f Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a082>] ? release_console_sem+0x115/0x1ba Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8104a632>] ? vprintk+0x2f2/0x34b Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8106b142>] ? vx_update_load+0x18/0x13e Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81308309>] ? printk+0x4e/0x5d Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff8102b49d>] ? kvm_clock_read+0x4d/0x52 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81070b62>] ? getnstimeofday+0x55/0xaf Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062683>] ? ktime_get_ts+0x21/0x49 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff810626b7>] ? ktime_get+0xc/0x41 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81062788>] ? hrtimer_interrupt+0x9c/0x146 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81024a4b>] ? smp_apic_timer_interrupt+0x80/0x93 Oct 8 11:16:52 172.30.1.31 [113074.791813] [<ffffffff81011663>] ? apic_timer_interrupt+0x13/0x20 Oct 8 11:16:52 172.30.1.31 [113074.791813] <EOI> [<ffffffff8130a9eb>] ? _spin_unlock_irq+0xd/0x31 Reported-and-tested-by: Massimo Cetra <mcetra@navynet.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Bug-Entry: http://bugzilla.kernel.org/show_bug.cgi?id=14378 Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | be2net: fix support for PCI hot plugSathya Perla2009-10-152-11/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before issuing any cmds to the FW, the driver must first wait till the fW becomes ready. This is needed for PCI hot plug when the driver can be probed while the card fw is being initialized. Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | be2net: fix promiscuous and multicast promiscuous modes being enabled alwaysSathya Perla2009-10-153-14/+18
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2009-10-224-5/+14
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.infradead.org/users/eparis/notify: dnotify: ignore FS_EVENT_ON_CHILD inotify: fix coalesce duplicate events into a single event in special case inotify: deprecate the inotify kernel interface fsnotify: do not set group for a mark before it is on the i_list
| | * | | dnotify: ignore FS_EVENT_ON_CHILDAndreas Gruenbacher2009-10-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mask off FS_EVENT_ON_CHILD in dnotify_handle_event(). Otherwise, when there is more than one watch on a directory and dnotify_should_send_event() succeeds, events with FS_EVENT_ON_CHILD set will trigger all watches and cause spurious events. This case was overlooked in commit e42e2773. #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <signal.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> static void create_event(int s, siginfo_t* si, void* p) { printf("create\n"); } static void delete_event(int s, siginfo_t* si, void* p) { printf("delete\n"); } int main (void) { struct sigaction action; char *tmpdir, *file; int fd1, fd2; sigemptyset (&action.sa_mask); action.sa_flags = SA_SIGINFO; action.sa_sigaction = create_event; sigaction (SIGRTMIN + 0, &action, NULL); action.sa_sigaction = delete_event; sigaction (SIGRTMIN + 1, &action, NULL); # define TMPDIR "/tmp/test.XXXXXX" tmpdir = malloc(strlen(TMPDIR) + 1); strcpy(tmpdir, TMPDIR); mkdtemp(tmpdir); # define TMPFILE "/file" file = malloc(strlen(tmpdir) + strlen(TMPFILE) + 1); sprintf(file, "%s/%s", tmpdir, TMPFILE); fd1 = open (tmpdir, O_RDONLY); fcntl(fd1, F_SETSIG, SIGRTMIN); fcntl(fd1, F_NOTIFY, DN_MULTISHOT | DN_CREATE); fd2 = open (tmpdir, O_RDONLY); fcntl(fd2, F_SETSIG, SIGRTMIN + 1); fcntl(fd2, F_NOTIFY, DN_MULTISHOT | DN_DELETE); if (fork()) { /* This triggers a create event */ creat(file, 0600); /* This triggers a create and delete event (!) */ unlink(file); } else { sleep(1); rmdir(tmpdir); } return 0; } Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| | * | | inotify: fix coalesce duplicate events into a single event in special caseWei Yongjun2009-10-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we do rename a dir entry, like this: rename("/tmp/ino7UrgoJ.rename1", "/tmp/ino7UrgoJ.rename2") rename("/tmp/ino7UrgoJ.rename2", "/tmp/ino7UrgoJ") The duplicate events should be coalesced into a single event. But those two events do not be coalesced into a single event, due to some bad check in event_compare(). It can not match the two NULL inodes as the same event. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Eric Paris <eparis@redhat.com>