summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* sched: Avoid side-effect of tickless idle on update_cpu_loadVenkatesh Pallipadi2010-06-092-6/+99
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tickless idle has a negative side effect on update_cpu_load(), which in turn can affect load balancing behavior. update_cpu_load() is supposed to be called every tick, to keep track of various load indicies. With tickless idle, there are no scheduler ticks called on the idle CPUs. Idle CPUs may still do load balancing (with idle_load_balance CPU) using the stale cpu_load. It will also cause problems when all CPUs go idle for a while and become active again. In this case loads would not degrade as expected. This is how rq->nr_load_updates change looks like under different conditions: <cpu_num> <nr_load_updates change> All CPUS idle for 10 seconds (HZ=1000) 0 1621 10 496 11 139 12 875 13 1672 14 12 15 21 1 1472 2 2426 3 1161 4 2108 5 1525 6 701 7 249 8 766 9 1967 One CPU busy rest idle for 10 seconds 0 10003 10 601 11 95 12 966 13 1597 14 114 15 98 1 3457 2 93 3 6679 4 1425 5 1479 6 595 7 193 8 633 9 1687 All CPUs busy for 10 seconds 0 10026 10 10026 11 10026 12 10026 13 10025 14 10025 15 10025 1 10026 2 10026 3 10026 4 10026 5 10026 6 10026 7 10026 8 10026 9 10026 That is update_cpu_load works properly only when all CPUs are busy. If all are idle, all the CPUs get way lower updates. And when few CPUs are busy and rest are idle, only busy and ilb CPU does proper updates and rest of the idle CPUs will do lower updates. The patch keeps track of when a last update was done and fixes up the load avg based on current time. On one of my test system SPECjbb with warehouse 1..numcpus, patch improves throughput numbers by ~1% (average of 6 runs). On another test system (with different domain hierarchy) there is no noticable change in perf. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> LKML-Reference: <AANLkTilLtDWQsAUrIxJ6s04WTgmw9GuOODc5AOrYsaR5@mail.gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: Simplify the reacquire_kernel_lock() logicOleg Nesterov2010-06-091-7/+6Star
| | | | | | | | | | | | | | | | | | | | | - Contrary to what 6d558c3a says, there is no need to reload prev = rq->curr after the context switch. You always schedule back to where you came from, prev must be equal to current even if cpu/rq was changed. - This also means reacquire_kernel_lock() can use prev instead of current. - No need to reassign switch_count if reacquire_kernel_lock() reports need_resched(), we can just move the initial assignment down, under the "need_resched_nonpreemptible:" label. - Try to update the comment after context_switch(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100519125711.GA30199@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched_clock: Add local_clock() API and improve documentationPeter Zijlstra2010-06-098-34/+113
| | | | | | | | | | | | | | | | For people who otherwise get to write: cpu_clock(smp_processor_id()), there is now: local_clock(). Also, as per suggestion from Andrew, provide some documentation on the various clock interfaces, and minimize the unsigned long long vs u64 mess. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jens Axboe <jaxboe@fusionio.com> LKML-Reference: <1275052414.1645.52.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'sched-wq' of ↵Ingo Molnar2010-06-089-81/+203
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq into sched/core
| * sched: add hooks for workqueueTejun Heo2010-06-084-3/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Concurrency managed workqueue needs to know when workers are going to sleep and waking up. Using these two hooks, cmwq keeps track of the current concurrency level and throttles execution of new works if it's too high and wakes up another worker from the sleep hook if it becomes too low. This patch introduces PF_WQ_WORKER to identify workqueue workers and adds the following two hooks. * wq_worker_waking_up(): called when a worker is woken up. * wq_worker_sleeping(): called when a worker is going to sleep and may return a pointer to a local task which should be woken up. The returned task is woken up using try_to_wake_up_local() which is simplified ttwu which is called under rq lock and can only wake up local tasks. Both hooks are currently defined as noop in kernel/workqueue_sched.h. Later cmwq implementation will replace them with proper implementation. These hooks are hard coded as they'll always be enabled. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu>
| * sched: refactor try_to_wake_up()Tejun Heo2010-06-081-34/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factor ttwu_activate() and ttwu_woken_up() out of try_to_wake_up(). The factoring out doesn't affect try_to_wake_up() much code-generation-wise. Depending on configuration options, it ends up generating the same object code as before or slightly different one due to different register assignment. This is to help future implementation of try_to_wake_up_local(). Mike Galbraith suggested rename to ttwu_post_activation() from ttwu_woken_up() and comment update in try_to_wake_up(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu>
| * sched: adjust when cpu_active and cpuset configurations are updated during ↵Tejun Heo2010-06-085-42/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpu on/offlining Currently, when a cpu goes down, cpu_active is cleared before CPU_DOWN_PREPARE starts and cpuset configuration is updated from a default priority cpu notifier. When a cpu is coming up, it's set before CPU_ONLINE but cpuset configuration again is updated from the same cpu notifier. For cpu notifiers, this presents an inconsistent state. Threads which a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be migrated to other cpus because the cpu is no more inactive. Fix it by updating cpu_active in the highest priority cpu notifier and cpuset configuration in the second highest when a cpu is coming up. Down path is updated similarly. This guarantees that all other cpu notifiers see consistent cpu_active and cpuset configuration. cpuset_track_online_cpus() notifier is converted to cpuset_update_active_cpus() which just updates the configuration and now called from cpuset_cpu_[in]active() notifiers registered from sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus() degenerates into partition_sched_domains() making separate notifier for !CONFIG_CPUSETS unnecessary. This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug callback creates a kthread and kthread_bind()s it to the target cpu, and the thread is expected to run on that cpu. * Ingo's test discovered __cpuinit/exit markups were incorrect. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Menage <menage@google.com>
| * sched: define and use CPU_PRI_* enums for cpu notifier prioritiesTejun Heo2010-06-083-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | Instead of hardcoding priority 10 and 20 in sched and perf, collect them into CPU_PRI_* enums. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
* | sched: Fix PROVE_RCU vs cpu_cgroupPeter Zijlstra2010-06-082-62/+73
|/ | | | | | | | | | | | | | | | | | | | | | | | PROVE_RCU has a few issues with the cpu_cgroup because the scheduler typically holds rq->lock around the css rcu derefs but the generic cgroup code doesn't (and can't) know about that lock. Provide means to add extra checks to the css dereference and use that in the scheduler to annotate its users. The addition of rq->lock to these checks is correct because the cgroup_subsys::attach() method takes the rq->lock for each task it moves, therefore by holding that lock, we ensure the task is pinned to the current cgroup and the RCU derefence is valid. That leaves one genuine race in __sched_setscheduler() where we used task_group() without holding any of the required locks and thus raced with the cgroup code. Solve this by moving the check under the appropriate lock. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge git://git.infradead.org/~dwmw2/mtd-2.6.35Linus Torvalds2010-06-086-90/+106
|\ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/~dwmw2/mtd-2.6.35: jffs2: update ctime when changing the file's permission by setfacl jffs2: Fix NFS race by using insert_inode_locked() jffs2: Fix in-core inode leaks on error paths mtd: Fix NAND submenu mtd/r852: update card detect early. mtd/r852: Fixes in case of DMA timeout mtd/r852: register IRQ as last step drivers/mtd: Use memdup_user docbook: make mtd nand module init static
| * jffs2: update ctime when changing the file's permission by setfaclJan Kara2010-06-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | jffs2 didn't update the ctime of the file when its permission was changed. Steps to reproduce: # touch aaa # stat -c %Z aaa 1275289822 # setfacl -m 'u::x,g::x,o::x' aaa # stat -c %Z aaa 1275289822 <- unchanged But, according to the spec of the ctime, jffs2 must update it. Port of ext3 patch by Miao Xie <miaox@cn.fujitsu.com>. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * jffs2: Fix NFS race by using insert_inode_locked()David Woodhouse2010-06-032-3/+16
| | | | | | | | | | | | | | | | | | New inodes need to be locked as we're creating them, so they don't get used by other things (like NFSd) before they're ready. Pointed out by Al Viro. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * jffs2: Fix in-core inode leaks on error pathsDavid Woodhouse2010-06-031-59/+56Star
| | | | | | | | | | | | Pointed out by Al Viro. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * mtd: Fix NAND submenuMaxim Levitsky2010-06-031-10/+11
| | | | | | | | | | | | | | | | Move MTD_NAND_ECC and MTD_NAND_ECC_SMC above NAND memuconfig, to unbreak display in xconfig. This shouldn't change any dependencies. Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * mtd/r852: update card detect early.Maxim Levitsky2010-06-021-1/+1
| | | | | | | | | | | | | | | | This turns out to be the reason for DMA timeouts on resume, if card was inserted while system was suspended Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * mtd/r852: Fixes in case of DMA timeoutMaxim Levitsky2010-06-021-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * Don't call complete on dma completion * do a INIT_COMPLETE before using it each time * Report DMA read error via ecc 'correct' I finally managed to make my system do suspend to ram propertly, and I see that if card was inserted during suspend (while system was off), I get dma timeouts on resume. Simple card reinsert solves the issue. This patch solves a crash that would happen otherwise Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * mtd/r852: register IRQ as last stepMaxim Levitsky2010-06-021-5/+6
| | | | | | | | | | | | | | | | Otherwise, if it fires right away, it might access uninitialized spinlock Signed-off-by: Maxim Levitsky <maximlevitsky@gmail.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * drivers/mtd: Use memdup_userJulia Lawall2010-05-221-8/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use memdup_user when user data is immediately copied into the allocated region. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression from,to,size,flag; position p; identifier l1,l2; @@ - to = \(kmalloc@p\|kzalloc@p\)(size,flag); + to = memdup_user(from,size); if ( - to==NULL + IS_ERR(to) || ...) { <+... when != goto l1; - -ENOMEM + PTR_ERR(to) ...+> } - if (copy_from_user(to, from, size) != 0) { - <+... when != goto l2; - -EFAULT - ...+> - } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * docbook: make mtd nand module init staticH Hartley Sweeten2010-05-211-1/+1
| | | | | | | | | | | | | | | | In the example the module_init function should be static. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | Merge branch 'upstream-linus' of ↵Linus Torvalds2010-06-085-27/+30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: ahci: redo stopping DMA engines on empty ports sata_sil24: fix kernel panic on ARM caused by unaligned access in sata_sil24 ahci: add pci quirk for JMB362 sata_via: explain the magic fix
| * | ahci: redo stopping DMA engines on empty portsTejun Heo2010-06-071-18/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 96d60303fd (ahci: Turn off DMA engines when there's no device) implemented stopping DMA engines on empty ports but it used single sampling of status registers to determine device presence which led to disabling of DMA engines on occupied ports. Do it after all EH actions are complete using device presence state determined by EH. This avoids spurious disabling of DMA engines and simplifies the code. Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Marc Dionne <marc.c.dionne@gmail.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Robert Hancock <hancockrwd@gmail.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * | sata_sil24: fix kernel panic on ARM caused by unaligned access in sata_sil24Colin Tuckley2010-06-071-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sata_sil24 driver has six 16-bit registers that are initialised with 32-bit writes. This cause a kernel panic on ARM due to the unaligned accesses which result. This patch changes the accesses to the correct 16-bit ones. Signed-off-by: Colin Tuckley <colin.tuckley@arm.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * | ahci: add pci quirk for JMB362Tejun Heo2010-06-072-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | JMB362 is a new variant of jmicron controller which is similar to JMB360 but has two SATA ports instead of one. As there is no PATA port, single function AHCI mode can be used as in JMB360. Add pci quirk for JMB362. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Aries Lee <arieslee@jmicron.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * | sata_via: explain the magic fixTejun Heo2010-06-071-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | Add Joseph Chan's explanation of the problem and workaround to the VT6421 magic fix. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Joseph Chan <JosephChan@via.com.tw> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | | [PATCH 2/11] drivers/watchdog: Eliminate a NULL pointer dereferenceJulia Lawall2010-06-071-1/+1
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the point of the call to dev_err, wm8350 is NULL. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ expression E,E1; identifier f; statement S1,S2,S3; @@ if ((E == NULL && ...) || ...) { ... when != if (...) S1 else S2 when != E = E1 * E->f ... when any return ...; } else S3 // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
* | Revert "tty: fix a little bug in scrup, vt.c"Linus Torvalds2010-06-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 962400e8fd29981a7b166e463dd143b6ac6a3e76, which was entirely bogus. The code used to multiply the character offset by "vc->vc_cols", and that's actually correct, because 'd' itself is an 'unsigned short'. So the pointer arithmetic already takes the size of a VGA character into account. Changing it to use vc_size_row (which is just "vc_cols" shifted up to take the size of the character into account) ends up multiplying with the VGA character size twice. This got reported as bugs for various other subsystems, because what it actually results in is writing the 16-bit vc_video_erase_char pattern (usually 0x0720: 0x07 is the default attribute, 0x20 is ASCII space) into some random other allocation. So Markus ended up reporting this as a ext4 bug, while to Torsten Kaiser it looked like a problem with KMS or libata. Jeff Chua saw it in different places. And finally - Justin Mattock had slab poisoning enabled, and saw it as a slab poison overwritten. And bisected and reverted this to verify the buggy commit. Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Reported-by: Torsten Kaiser <just.for.lkml@googlemail.com> Reported-by: Jeff Chua <jeff.chua.linux@gmail.com> Reported-by: Justin P. Mattock <justinmattock@gmail.com> Reported-bisected-and-tested-by: Justin P. Mattock <justinmattock@gmail.com> Acked-by: Dave Airlie <airlied@redhat.com> Cc: Frank Pan <frankpzh@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Linux 2.6.35-rc2Linus Torvalds2010-06-061-1/+1
| |
* | drm/i915: Move non-phys cursors into the GTTChris Wilson2010-06-061-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cursors need to be in the GTT domain when being accessed by the GPU. Previously this was a fortuitous byproduct of userspace using pwrite() to upload the image data into the cursor. The redundant clflush was removed in commit 9b8c4a and so the image was no longer being flushed out of the caches into main memory. One could also devise a scenario where the cursor was rendered by the GPU, prior to being attached as the cursor, resulting in similar corruption due to the missing MI_FLUSH. Fixes: Bug 28335 - Cursor corruption caused by commit 9b8c4a0b21 https://bugs.freedesktop.org/show_bug.cgi?id=28335 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Andy Isaacson <adi@hexapodia.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for_linus' of ↵Linus Torvalds2010-06-052-17/+26
|\ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix remaining racy updates of EXT4_I(inode)->i_flags ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only files
| * | ext4: Fix remaining racy updates of EXT4_I(inode)->i_flagsDmitry Monakhov2010-06-051-17/+23
| | | | | | | | | | | | | | | | | | | | | A few functions were still modifying i_flags in a racy manner. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | ext4: Make sure the MOVE_EXT ioctl can't overwrite append-only filesTheodore Ts'o2010-06-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Roseberg has reported a problem with the MOVE_EXT ioctl. If the donor file is an append-only file, we should not allow the operation to proceed, lest we end up overwriting the contents of an append-only file. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Dan Rosenberg <dan.j.rosenberg@gmail.com>
* | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-06-0518-798/+736Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: improve xfs_isilocked xfs: skip writeback from reclaim context xfs: remove done roadmap item from xfs-delayed-logging-design.txt xfs: fix race in inode cluster freeing failing to stale inodes xfs: fix access to upper inodes without inode64 xfs: fix might_sleep() warning when initialising per-ag tree fs/xfs/quota: Add missing mutex_unlock xfs: remove duplicated #include xfs: convert more trace events to DEFINE_EVENT xfs: xfs_trace.c: remove duplicated #include xfs: Check new inode size is OK before preallocating xfs: clean up xlog_align xfs: cleanup log reservation calculactions xfs: be more explicit if RT mount fails due to config xfs: replace E2BIG with EFBIG where appropriate
| * \ \ Merge branch 'master' into for-linusAlex Elder2010-06-0418-798/+736Star
| |\ \ \
| | * | | xfs: improve xfs_isilockedChristoph Hellwig2010-06-031-16/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use rwsem_is_locked to make the assertations for shared locks work. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| | * | | xfs: skip writeback from reclaim contextChristoph Hellwig2010-06-031-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allowing writeback from reclaim context causes massive problems with stack overflows as we can call into the writeback code which tends to be a heavy stack user both in the generic code and XFS from random contexts that perform memory allocations. Follow the example of btrfs (and in slightly different form ext4) and refuse to write out data from reclaim context. This issue should really be handled by the VM so that we can tune better for this case, but until we get it sorted out there we have to hack around this in each filesystem with a complex writeback path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| | * | | xfs: remove done roadmap item from xfs-delayed-logging-design.txtChristoph Hellwig2010-06-031-5/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| | * | | xfs: fix race in inode cluster freeing failing to stale inodesDave Chinner2010-06-031-80/+62Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an inode cluster is freed, it needs to mark all inodes in memory as XFS_ISTALE before marking the buffer as stale. This is eeded because the inodes have a different life cycle to the buffer, and once the buffer is torn down during transaction completion, we must ensure none of the inodes get written back (which is what XFS_ISTALE does). Unfortunately, xfs_ifree_cluster() has some bugs that lead to inodes not being marked with XFS_ISTALE. This shows up when xfs_iflush() is called on these inodes either during inode reclaim or tail pushing on the AIL. The buffer is read back, but no longer contains inodes and so triggers assert failures and shutdowns. This was reproducable with at run.dbench10 invocation from xfstests. There are two main causes of xfs_ifree_cluster() failing. The first is simple - it checks in-memory inodes it finds in the per-ag icache to see if they are clean without holding the flush lock. if they are clean it skips them completely. However, If an inode is flushed delwri, it will appear clean, but is not guaranteed to be written back until the flush lock has been dropped. Hence we may have raced on the clean check and the inode may actually be dirty. Hence always mark inodes found in memory stale before we check properly if they are clean. The second is more complex, and makes the first problem easier to hit. Basically the in-memory inode scan is done with full knowledge it can be racing with inode flushing and AIl tail pushing, which means that inodes that it can't get the flush lock on might not be attached to the buffer after then in-memory inode scan due to IO completion occurring. This is actually documented in the code as "needs better interlocking". i.e. this is a zero-day bug. Effectively, the in-memory scan must be done while the inode buffer is locked and Io cannot be issued on it while we do the in-memory inode scan. This ensures that inodes we couldn't get the flush lock on are guaranteed to be attached to the cluster buffer, so we can then catch all in-memory inodes and mark them stale. Now that the inode cluster buffer is locked before the in-memory scan is done, there is no need for the two-phase update of the in-memory inodes, so simplify the code into two loops and remove the allocation of the temporary buffer used to hold locked inodes across the phases. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| | * | | xfs: fix access to upper inodes without inode64Christoph Hellwig2010-05-285-46/+21Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a filesystem is mounted without the inode64 mount option we should still be able to access inodes not fitting into 32 bits, just not created new ones. For this to work we need to make sure the inode cache radix tree is initialized for all allocation groups, not just those we plan to allocate inodes from. This patch makes sure we initialize the inode cache radix tree for all allocation groups, and also cleans xfs_initialize_perag up a bit to separate the inode32 logical from the general perag structure setup. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: fix might_sleep() warning when initialising per-ag treeDave Chinner2010-05-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of radix_tree_preload() only works if the radix tree was initialised without the __GFP_WAIT flag. The per-ag tree uses GFP_NOFS, so does not trigger allocation of new tree nodes from the preloaded array. Hence it enters the allocator with a spinlock held and triggers the might_sleep() warnings. Reported-by; Chris Mason <chris.mason@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | fs/xfs/quota: Add missing mutex_unlockJulia Lawall2010-05-281-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a mutex_unlock missing on the error path. The use of this lock is balanced elsewhere in the file. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression E1; @@ * mutex_lock(E1,...); <+... when != E1 if (...) { ... when != E1 * return ...; } ...+> * mutex_unlock(E1,...); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: remove duplicated #includeHuang Weiyi2010-05-281-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated #include('s) in fs/xfs/linux-2.6/xfs_quotaops.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: convert more trace events to DEFINE_EVENTLi Zefan2010-05-281-168/+188
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use DECLARE_EVENT_CLASS, and save ~15K: text data bss dec hex filename 171949 43028 48 215025 347f1 fs/xfs/linux-2.6/xfs_trace.o.orig 156521 43028 36 199585 30ba1 fs/xfs/linux-2.6/xfs_trace.o No change in functionality. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: xfs_trace.c: remove duplicated #includeHuang Weiyi2010-05-281-1/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated #include('s) in fs/xfs/linux-2.6/xfs_trace.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Reviewed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: Check new inode size is OK before preallocatingDave Chinner2010-05-282-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new xfsqa test 228 tries to preallocate more space than the filesystem contains. it should fail, but instead triggers an assert about lock flags. The failure is due to the size extension failing in vmtruncate() due to rlimit being set. Check this before we start the preallocation to avoid allocating space that will never be used. Also the path through xfs_vn_allocate already holds the IO lock, so it should not be present in the lock flags when the setattr fails. Hence the assert needs to take this into account. This will prevent other such callers from hitting this incorrect ASSERT. (Fixed a reference to "newsize" to read "new_size". -Alex) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: clean up xlog_alignChristoph Hellwig2010-05-281-8/+3Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add suggested cleanups to commit 29db3370a1369541d58d692fbfb168b8a0bd7f41 from review that didn't end up being commited. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: cleanup log reservation calculactionsChristoph Hellwig2010-05-282-457/+400Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of having small helper functions calling big macros do the calculations for the log reservations directly in the functions. These are mostly 1:1 from the macros execept that the macros kept the quota calculations in their callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: be more explicit if RT mount fails due to configEric Sandeen2010-05-281-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recent testers were slightly confused that a realtime mount failed due to missing CONFIG_XFS_RT; we can make that a little more obvious. V2: drop the else as suggested by Christoph Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| | * | | xfs: replace E2BIG with EFBIG where appropriateEric Sandeen2010-05-282-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many places in the xfs code return E2BIG when they really mean EFBIG; trying to grow past 16T on a 32 bit machine, for example, says "Argument list too long" rather than "File too large" which is not particularly helpful. Some of these don't make perfect sense as EFBIG either, but still better than E2BIG IMHO. Signed-off-by: Eric Sandeen <sandeen@sandeen.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2010-06-0531-205/+256
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (30 commits) X25: remove duplicated #include tcp: use correct net ns in cookie_v4_check() rps: tcp: fix rps_sock_flow_table table updates ppp_generic: fix multilink fragment sizes syncookies: remove Kconfig text line about disabled-by-default ixgbe: only check pfc bits in hang logic if pfc is enabled net: check for refcount if pop a stacked dst_entry ixgbe: return IXGBE_ERR_RAR_INDEX when out of range act_pedit: access skb->data safely sfc: Store port number in net_device::dev_id epic100: Test __BIG_ENDIAN instead of (non-existent) CONFIG_BIG_ENDIAN tehuti: return -EFAULT on copy_to_user errors isdn/kcapi: return -EFAULT on copy_from_user errors e1000e: change logical negate to bitwise sfc: Get port number from CS_PORT_NUM, not PCI function number cls_u32: use skb_header_pointer() to dereference data safely TCP: tcp_hybla: Fix integer overflow in slow start increment act_nat: fix the wrong checksum when addr isn't in old_addr/mask net/fec: fix pm to survive to suspend/resume korina: count RX DMA OVR as rx_fifo_error ...
| * | | | | X25: remove duplicated #includeHuang Weiyi2010-06-051-2/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated #include('s) in drivers/net/wan/x25_asy.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>