summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
* NFSv4.1: Replace dprintk() in pnfs_update_layout with something less buggyTrond Myklebust2012-09-281-7/+11
| | | | | | | | | | Dereferencing nfsi->layout in order to read plh_flags without holding a spin lock is bug prone. Furthermore, the dprintk() tells you nothing about whether or not the call succeeded. Replace it with something that tells you about whether or not a valid layout segment was returned for the inode in question. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Replace get_device_info() with filelayout_get_device_info()Trond Myklebust2012-09-284-4/+4
| | | | | | Fix the namespace pollution issue. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Cleanup; add "pnfs_" prefix to put_lseg() and get_lseg()Trond Myklebust2012-09-284-31/+31
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Cleanup; add "pnfs_" prefix to get_layout_hdr() and put_layout_hdr()Trond Myklebust2012-09-284-22/+22
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: Cleanup add a "pnfs_" prefix to mark_matching_lsegs_invalidTrond Myklebust2012-09-283-6/+6
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Clean up the pNFS layoutget interfaceTrond Myklebust2012-09-284-17/+27
| | | | | | | | Ensure that we do return errors from nfs4_proc_layoutget() and that we don't mark the layout as having failed if the error was due to a signal or resource problem on the client side. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SUNRPC: Get rid of the redundant xprt->shutdown bit fieldTrond Myklebust2012-09-284-40/+11Star
| | | | | | | | | It is only set after everyone has dereferenced the transport, and serves no useful purpose: setting it is racy, so all the socket code, etc still needs to be able to cope with the cases where they miss reading it. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Write the entire file if a server reboot occurs during fsync()Trond Myklebust2012-09-282-0/+14
| | | | | | | This is to ensure that we don't clear the NFS_CONTEXT_RESEND_WRITES flag while there are still writes that haven't been resent. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Fix fdatasync/fsync() when confronted with a server rebootTrond Myklebust2012-09-284-22/+36
| | | | | | | | | | | | | If the server reboots before it can commit the unstable writes to disk, then nfs_commit_release_pages() will detect this when it compares the verifier returned by COMMIT to the one returned by WRITE. When this happens, the client needs to resend those writes in order to guarantee that they make it to stable storage. This patch adds a signalling mechanism to notify fsync() that it needs to retry all writes before it can exit. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4: Convert the nfs4_lock_state->ls_flags to a bit fieldTrond Myklebust2012-09-283-11/+11
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Clean up helper function nfs4_select_rw_stateid()Trond Myklebust2012-09-288-16/+42
| | | | | | | | | | | We want to be able to pass on the information that the page was not dirtied under a lock. Instead of adding a flag parameter, do this by passing a pointer to a 'struct nfs_lock_owner' that may be NULL. Also reuse this structure in struct nfs_lock_context to carry the fl_owner_t and pid_t. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: Convert nfs_get_lock_context to return an ERR_PTR on failureTrond Myklebust2012-09-283-8/+18
| | | | | | | | We want to be able to distinguish between allocation failures, and the case where the lock context is not needed (because there are no locks). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SUNRPC: Optimise away unnecessary data moves in xdr_align_pagesTrond Myklebust2012-09-281-10/+13
| | | | | | | | We only have to call xdr_shrink_pagelen() if the remaining RPC message does not fit in the page buffer length that we supplied to xdr_align_pages(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFSv4.1: decode_getdeviceinfo should check xdr_read_pages() return valueTrond Myklebust2012-09-261-1/+2
| | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SUNRPC: Fix the return value of xdr_align_pages()Trond Myklebust2012-09-261-0/+2
| | | | | | | The callers of xdr_align_pages() expect it to return the number of bytes of actual XDR data remaining in the pages. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS4: avoid underflow when converting error to pointer.NeilBrown2012-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | In nfs4_create_sec_client, 'flavor' can hold a negative error code (returned from nfs4_negotiate_security), even though it is an 'enum' and hence unsigned. The code is careful to cast it to an (int) before testing if it is negative, however it doesn't cast to an (int) before calling ERR_PTR. On a machine where "void*" is larger than "int", this results in the unsigned equivalent of -1 (e.g. 0xffffffff) being converted to a pointer. Subsequent code determines that this is not negative, and so dereferences it with predictable results. So: cast 'flavor' to a (signed) int before passing to ERR_PTR. cc: Benny Halevy <bhalevy@tonian.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* NFS: fix the return value check by using IS_ERRWei Yongjun2012-09-252-2/+2
| | | | | | | | | | | | In case of error, the function rpcauth_create() returns ERR_PTR() and never returns NULL pointer. The NULL test in the return value check should be replaced with IS_ERR(). dpatch engine is used to auto generated this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SUNRPC: Set alloc_slot for backchannel tcp opsBryan Schumaker2012-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | f39c1bfb5a03e2d255451bff05be0d7255298fa4 (SUNRPC: Fix a UDP transport regression) introduced the "alloc_slot" function for xprt operations, but never created one for the backchannel operations. This patch fixes a null pointer dereference when mounting NFS over v4.1. Call Trace: [<ffffffffa0207957>] ? xprt_reserve+0x47/0x50 [sunrpc] [<ffffffffa02023a4>] call_reserve+0x34/0x60 [sunrpc] [<ffffffffa020e280>] __rpc_execute+0x90/0x400 [sunrpc] [<ffffffffa020e61a>] rpc_async_schedule+0x2a/0x40 [sunrpc] [<ffffffff81073589>] process_one_work+0x139/0x500 [<ffffffff81070e70>] ? alloc_worker+0x70/0x70 [<ffffffffa020e5f0>] ? __rpc_execute+0x400/0x400 [sunrpc] [<ffffffff81073d1e>] worker_thread+0x15e/0x460 [<ffffffff8145c839>] ? preempt_schedule+0x49/0x70 [<ffffffff81073bc0>] ? rescuer_thread+0x230/0x230 [<ffffffff81079603>] kthread+0x93/0xa0 [<ffffffff81465d04>] kernel_thread_helper+0x4/0x10 [<ffffffff81079570>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81465d00>] ? gs_change+0x13/0x13 Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* SUNRPC: Ensure that the TCP socket is closed when in CLOSE_WAITTrond Myklebust2012-09-201-5/+16
| | | | | | | | | Instead of doing a shutdown() call, we need to do an actual close(). Ditto if/when the server is sending us junk RPC headers. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Simon Kirby <sim@hostway.ca> Cc: stable@vger.kernel.org
* Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2012-09-195-20/+39
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A small collection of driver fixes/updates and a core fix for 3.6. It contains: - Bug fixes for mtip32xx, and support for new hardware (just addition of IDs). They have been queued up for 3.7 for a few weeks as well. - rate-limit a failing command error message in block core. - A fix for an old cciss bug from Stephen. - Prevent overflow of partition count from Alan." * 'for-linus' of git://git.kernel.dk/linux-block: cciss: fix handling of protocol error blk: add an upper sanity check on partition adding mtip32xx: fix user_buffer check in exec_drive_command mtip32xx: Remove dead code mtip32xx: Change printk to pr_xxxx mtip32xx: Proper reporting of write protect status on big-endian mtip32xx: Increase timeout for standby command mtip32xx: Handle NCQ commands during the security locked state mtip32xx: Add support for new devices block: rate-limit the error message from failing commands
| * cciss: fix handling of protocol errorStephen M. Cameron2012-09-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a command completes with a status of CMD_PROTOCOL_ERR, this information should be conveyed to the SCSI mid layer, not dropped on the floor. Unlike a similar bug in the hpsa driver, this bug only affects tape drives and CD and DVD ROM drives in the cciss driver, and to induce it, you have to disconnect (or damage) a cable, so it is not a very likely scenario (which would explain why the bug has gone undetected for the last 10 years.) Signed-off-by: Stephen M. Cameron <scameron@beardog.cce.hp.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * blk: add an upper sanity check on partition addingAlan Cox2012-09-181-1/+1
| | | | | | | | | | | | | | | | 65536 should be ludicrous anyway but without it we overflow the memory computation doing the allocation and badness occurs. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: fix user_buffer check in exec_drive_commandDavid Milburn2012-09-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current user_buffer check is incorrect and causes hdparm to fail # hdparm -I /dev/rssda HDIO_DRIVE_CMD(identify) failed: Input/output error /dev/rssda: Patching linux-3.6-rc5 hdparm works as expected # hdparm -I /dev/rssda /dev/rssda: ATA device, with non-removable media Model Number: DELL_P320h-MTFDGAL350SAH Serial Number: 00000000121302025F01 Firmware Revision: B1442808 <snip> Reported-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: David Milburn <dmilburn@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Remove dead codeAsai Thambi S P2012-09-121-6/+0Star
| | | | | | | | | | | | | | | | | | Removed the dead code in mtip_hw_read_registers() and mtip_hw_read_flags(). Reported-by: Coverity Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Change printk to pr_xxxxAsai Thambi S P2012-09-121-3/+3
| | | | | | | | | | | | | | | | Changed printk to be compliant with latest style changes Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Proper reporting of write protect status on big-endianAsai Thambi S P2012-09-121-2/+2
| | | | | | | | | | | | | | | | Proper reporting of write protect status on big-endian Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Increase timeout for standby commandAsai Thambi S P2012-09-121-1/+1
| | | | | | | | | | | | | | | | Increased timeout for standby command to work with larger capacity drives Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Handle NCQ commands during the security locked stateAsai Thambi S P2012-09-122-1/+11
| | | | | | | | | | | | | | | | Return error for NCQ commands when the drive is in security locked state Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * mtip32xx: Add support for new devicesAsai Thambi S P2012-09-122-2/+14
| | | | | | | | | | | | | | | | Added supported device IDs in pci table Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com> Signed-off-by: Selvan Mani <smani@micron.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * block: rate-limit the error message from failing commandsYi Zou2012-08-311-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When performing a cable pull test w/ active stress I/O using fio over a dual port Intel 82599 FCoE CNA, w/ 256LUNs on one port and about 32LUNs on the other, it is observed that the system becomes not usable due to scsi-ml being busy printing the error messages for all the failing commands. I don't believe this problem is specific to FCoE and these commands are anyway failing due to link being down (DID_NO_CONNECT), just rate-limit the messages here to solve this issue. v2->v1: use __ratelimit() as Tomas Henzl mentioned as the proper way for rate-limit per function. However, in this case, the failed i/o gets to blk_end_request_err() and then blk_update_request(), which also has to be rate-limited, as added in the v2 of this patch. v3-v2: resolved conflict to apply on current 3.6-rc3 upstream tip. Signed-off-by: Yi Zou <yi.zou@intel.com> Cc: www.Open-FCoE.org <devel@open-fcoe.org> Cc: Tomas Henzl <thenzl@redhat.com> Cc: <linux-scsi@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-shLinus Torvalds2012-09-194-4/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | Pull SuperH fixes from Paul Mundt. * tag 'sh-for-linus' of git://github.com/pmundt/linux-sh: sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling. sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error path sh: intc: Fix up multi-evt irq association.
| * | sh: Fix up TIF_NOTIFY_RESUME sans TIF_SIGPENDING handling.Al Viro2012-09-182-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Al notes, we missed a TIF_NOTIFY_RESUME check which caused any handlers without TIF_SIGPENDING also set to skip the notification: Looks like while it is in the relevant masks *and* checked in do_notify_resume() both on 32bit and 64bit variants since commit ab99c733ae73cce31f2a2434f7099564e5a73d95 ("sh: Make syscall tracer use tracehook notifiers, add TIF_NOTIFY_RESUME.") they are actually *not* reached without simulataneous SIGPENDING, since the actual glue in the callers had not been updated back then and still checks for _TIF_SIGPENDING alone when deciding whether to hit do_notify_resume() or not. Reported-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Tested-by: Nobuhiro Iwamatsu <nobuhiro.iwamatsu.yj@renesas.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: pfc: Release spinlock in sh_pfc_gpio_request_enable() error pathLaurent Pinchart2012-09-181-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | The sh_pfc_gpio_request_enable() function acquires a spinlock but fails to release it before returning if the requested mux type is not supported. Fix this. Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: intc: Fix up multi-evt irq association.Paul Mundt2012-08-201-1/+1
| | | | | | | | | | | | | | | | | | | | | In the multi-evt case we were accidentally associating the parent IRQ, fix this up. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | Merge tag 'rpmsg-3.6-fix' of ↵Linus Torvalds2012-09-191-3/+3
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg Pull rpmsg fix from Ohad Ben-Cohen: "A quick rpmsg fix from Fernando, fixing two buggy invocations of dma_free_coherent" * tag 'rpmsg-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg: rpmsg: fix dma_free_coherent dev parameter
| * | | rpmsg: fix dma_free_coherent dev parameterFernando Guzman Lugo2012-09-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dma_alloc/free_coherent APIs requires the platform specific remoteproc device as the device parameter. We are passing vdev->dev.parent to the dma_free_coherent function which is the generic rproc device and it is wrong, it has to be vdev->dev.parent->parent instead, same as when we call dma_alloc_coherent function. Signed-off-by: Fernando Guzman Lugo <fernando.lugo@ti.com> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
* | | | Merge tag 'md-3.6-fixes' of git://neil.brown.name/mdLinus Torvalds2012-09-192-2/+10
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md fixes from NeilBrown: "3 fixes for md in 3.6. One reverts a recent patch which turns out to not be such a good idea. Other two fix minor bugs with the new (since 3.3) 'replacement' code and have been tagged for -stable." * tag 'md-3.6-fixes' of git://neil.brown.name/md: md: make sure metadata is updated when spares are activated or removed. md/raid5: fix calculate of 'degraded' when a replacement becomes active. Revert "md/raid5: For odirect-write performance, do not set STRIPE_PREREAD_ACTIVE."
| * | | | md: make sure metadata is updated when spares are activated or removed.NeilBrown2012-09-191-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It isn't always necessary to update the metadata when spares are removed as the presence-or-not of a spare isn't really important to the integrity of an array. Also activating a spare doesn't always require updating the metadata as the update on 'recovery-completed' is usually sufficient. However the introduction of 'replacement' devices have made these transitions sometimes more important. For example the 'Replacement' flag isn't cleared until the original device is removed, so we need to ensure a metadata update after that 'spare' is removed. So set MD_CHANGE_DEVS whenever a spare is activated or removed, to complement the current situation where it is set when a spare is added or a device is failed (or a number of other less common situations). This is suitable for -stable as out-of-data metadata could lead to data corruption. This is only relevant for 3.3 and later 9when 'replacement' as introduced. Cc: stable@vger.kernel.org Signed-off-by: NeilBrown <neilb@suse.de>
| * | | | md/raid5: fix calculate of 'degraded' when a replacement becomes active.NeilBrown2012-09-191-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a replacement device becomes active, we mark the device that it replaces as 'faulty' so that it can subsequently get removed. However 'calc_degraded' only pays attention to the primary device, not the replacement, so the array appears to become degraded, which is wrong. So teach 'calc_degraded' to consider any replacement if a primary device is faulty. This is suitable for -stable as an incorrect 'degraded' value can confuse md and could lead to data corruption. This is only relevant for 3.3 and later. Cc: stable@vger.kernel.org Reported-by: Robin Hill <robin@robinhill.me.uk> Reported-by: John Drescher <drescherjm@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | | | Revert "md/raid5: For odirect-write performance, do not set ↵NeilBrown2012-09-191-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | STRIPE_PREREAD_ACTIVE." This reverts commit 895e3c5c58a80bb9e4e05d9ac38b4f30e0f97d80. While this patch seemed like a good idea and did help some workloads, it hurts other workloads. Large sequential O_DIRECT writes were faster, Small random O_DIRECT writes were slower. Other changes (batching RAID5 writes) have improved the sequential writes using a different mechanism, so the net result of this patch is definitely negative. So revert it. Reported-by: Shaohua Li <shli@kernel.org> Tested-by: Jianpeng Ma <majianpeng@gmail.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | Merge branch 'for-3.6-fixes' of ↵Linus Torvalds2012-09-192-46/+42Star
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue / powernow-k8 fix from Tejun Heo: "This is the fix for the bug where cpufreq/powernow-k8 was tripping BUG_ON() in try_to_wake_up_local() by migrating workqueue worker to a different CPU. https://bugzilla.kernel.org/show_bug.cgi?id=47301 As discussed, the fix is now two parts - one to reimplement work_on_cpu() so that it doesn't create a new kthread each time and the actual fix which makes powernow-k8 use work_on_cpu() instead of performing manual migration. While pretty late in the merge cycle, both changes are on the safer side. Jiri and I verified two existing users of work_on_cpu() and Duncan confirmed that the powernow-k8 fix survived about 18 hours of testing." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPU workqueue: reimplement work_on_cpu() using system_wq
| * | | | cpufreq/powernow-k8: workqueue user shouldn't migrate the kworker to another CPUTejun Heo2012-09-191-29/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | powernowk8_target() runs off a per-cpu work item and if the cpufreq_policy->cpu is different from the current one, it migrates the kworker to the target CPU by manipulating current->cpus_allowed. The function migrates the kworker back to the original CPU but this is still broken. Workqueue concurrency management requires the kworkers to stay on the same CPU and powernowk8_target() ends up triggerring BUG_ON(rq != this_rq()) in try_to_wake_up_local() if it contends on fidvid_mutex and sleeps. It is unclear why this bug is being reported now. Duncan says it appeared to be a regression of 3.6-rc1 and couldn't reproduce it on 3.5. Bisection seemed to point to 63d95a91 "workqueue: use @pool instead of @gcwq or @cpu where applicable" which is an non-functional change. Given that the reproduce case sometimes took upto days to trigger, it's easy to be misled while bisecting. Maybe something made contention on fidvid_mutex more likely? I don't know. This patch fixes the bug by using work_on_cpu() instead if @pol->cpu isn't the same as the current one. The code assumes that cpufreq_policy->cpu is kept online by the caller, which Rafael tells me is the case. stable: ed48ece27c ("workqueue: reimplement work_on_cpu() using system_wq") should be applied before this; otherwise, the behavior could be horrible. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Duncan <1i5t5.duncan@cox.net> Tested-by: Duncan <1i5t5.duncan@cox.net> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Andreas Herrmann <andreas.herrmann3@amd.com> Cc: stable@vger.kernel.org Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=47301
| * | | | workqueue: reimplement work_on_cpu() using system_wqTejun Heo2012-09-191-17/+8Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing work_on_cpu() implementation is hugely inefficient. It creates a new kthread, execute that single function and then let the kthread die on each invocation. Now that system_wq can handle concurrent executions, there's no advantage of doing this. Reimplement work_on_cpu() using system_wq which makes it simpler and way more efficient. stable: While this isn't a fix in itself, it's needed to fix a workqueue related bug in cpufreq/powernow-k8. AFAICS, this shouldn't break other existing users. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Len Brown <lenb@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: stable@vger.kernel.org
* | | | | Merge tag 'hwspinlock-3.6-fix' of ↵Linus Torvalds2012-09-181-1/+2
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock Pull hwspinlock fix from Ohad Ben-Cohen: "A single hwspinlock fix by Wei Yongjun, which prevents potential NULL dereferences" * tag 'hwspinlock-3.6-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/hwspinlock: hwspinlock/core: move the dereference below the NULL test
| * | | | | hwspinlock/core: move the dereference below the NULL testWei Yongjun2012-09-101-1/+2
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dereference should be moved below the NULL test. spatch with a semantic match is used to found this. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
* | | | | vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill()Miklos Szeredi2012-09-182-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IBM reported a soft lockup after applying the fix for the rename_lock deadlock. Commit c83ce989cb5f ("VFS: Fix the nfs sillyrename regression in kernel 2.6.38") was found to be the culprit. The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the dentry was killed. This flag can be set on non-killed dentries too, which results in infinite retries when trying to traverse the dentry tree. This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is only set in d_kill() and makes try_to_ascend() test only this flag. IBM reported successful test results with this patch. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge branch 'for-3.6-fixes' of ↵Linus Torvalds2012-09-181-2/+10
|\ \ \ \ \ | | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull another workqueue fix from Tejun Heo: "Unfortunately, yet another late fix. This too is discovered and fixed by Lai. This bug was introduced during this merge window by commit 25511a477657 ("workqueue: reimplement CPU online rebinding to handle idle workers") which started using WORKER_REBIND flag for idle rebind too. The bug is relatively easy to trigger if the CPU rapidly goes through off, on and then off (and stay off). The fix is on the safer side. This hasn't been on linux-next yet but I'm pushing early so that it can get more exposure before v3.6 release." * 'for-3.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()
| * | | | workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn()Lai Jiangshan2012-09-181-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed (CPU is down again). This used to be okay because the flag wasn't used for anything else. However, after 25511a477 "workqueue: reimplement CPU online rebinding to handle idle workers", WORKER_REBIND is also used to command idle workers to rebind. If not cleared, the worker may confuse the next CPU_UP cycle by having REBIND spuriously set or oops / get stuck by prematurely calling idle_worker_rebind(). WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5 00() Hardware name: Bochs Modules linked in: test_wq(O-) Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3 Call Trace: [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500 [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 ---[ end trace e977cf20f4661968 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: test_wq(O-) CPU 0 Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580 R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900 FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900) Stack: ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010 ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340 Call Trace: [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP <ffff88001e1c9de0> CR2: 0000000000000000 There was no reason to keep WORKER_REBIND on failure in the first place - WORKER_UNBOUND is guaranteed to be set in such cases preventing incorrectly activating concurrency management. Always clear WORKER_REBIND. tj: Updated comment and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | | Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds2012-09-1813-20/+60
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge fixes from Andrew Morton: "13 patches. 12 are fixes and one is a little preparatory thing for Andi." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (13 commits) memory hotplug: fix section info double registration bug mm/page_alloc: fix the page address of higher page's buddy calculation drivers/rtc/rtc-twl.c: ensure all interrupts are disabled during probe compiler.h: add __visible pid-namespace: limit value of ns_last_pid to (0, max_pid) include/net/sock.h: squelch compiler warning in sk_rmem_schedule() slub: consider pfmemalloc_match() in get_partial_node() slab: fix starting index for finding another object slab: do ClearSlabPfmemalloc() for all pages of slab nbd: clear waiting_queue on shutdown MAINTAINERS: fix TXT maintainer list and source repo path mm/ia64: fix a memory block size bug memory hotplug: reset pgdat->kswapd to NULL if creating kernel thread fails
| * | | | | memory hotplug: fix section info double registration bugqiuxishi2012-09-181-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There may be a bug when registering section info. For example, on my Itanium platform, the pfn range of node0 includes the other nodes, so other nodes' section info will be double registered, and memmap's page count will equal to 3. node0: start_pfn=0x100, spanned_pfn=0x20fb00, present_pfn=0x7f8a3, => 0x000100-0x20fc00 node1: start_pfn=0x80000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x080000-0x100000 node2: start_pfn=0x100000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x100000-0x180000 node3: start_pfn=0x180000, spanned_pfn=0x80000, present_pfn=0x80000, => 0x180000-0x200000 free_all_bootmem_node() register_page_bootmem_info_node() register_page_bootmem_info_section() When hot remove memory, we can't free the memmap's page because page_count() is 2 after put_page_bootmem(). sparse_remove_one_section() free_section_usemap() free_map_bootmem() put_page_bootmem() [akpm@linux-foundation.org: add code comment] Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Jiang Liu <jiang.liu@huawei.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>