summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * device_cgroup: fix the comment format for recently added functionsAristeu Rozanski2014-05-041-17/+16Star
| | | | | | | | | | | | | | | | | | Moving more extensive explanations to the end of the comment. Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * device_cgroup: rework device access check and exception checkingAristeu Rozanski2014-04-221-40/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever a device file is opened and checked against current device cgroup rules, it uses the same function (may_access()) as when a new exception rule is added by writing devices.{allow,deny}. And in both cases, the algorithm is the same, doesn't matter the behavior. First problem is having device access to be considered the same as rule checking. Consider the following structure: A (default behavior: allow, exceptions disallow access) \ B (default behavior: allow, exceptions disallow access) A new exception is added to B by writing devices.deny: c 12:34 rw When checking if that exception is allowed in may_access(): if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { if (behavior == DEVCG_DEFAULT_ALLOW) { /* the exception will deny access to certain devices */ return true; Which is ok, since B is not getting more privileges than A, it doesn't matter and the rule is accepted Now, consider it's a device file open check and the process belongs to cgroup B. The access will be generated as: behavior: allow exception: c 12:34 rw The very same chunk of code will allow it, even if there's an explicit exception telling to do otherwise. A simple test case: # mkdir new_group # cd new_group # echo $$ >tasks # echo "c 1:3 w" >devices.deny # echo >/dev/null # echo $? 0 This is a serious bug and was introduced on c39a2a3018f8 devcg: prepare may_access() for hierarchy support To solve this problem, the device file open function was split from the new exception check. Second problem is how exceptions are processed by may_access(). The first part of the said function tries to match fully with an existing exception: list_for_each_entry_rcu(ex, &dev_cgroup->exceptions, list) { if ((refex->type & DEV_BLOCK) && !(ex->type & DEV_BLOCK)) continue; if ((refex->type & DEV_CHAR) && !(ex->type & DEV_CHAR)) continue; if (ex->major != ~0 && ex->major != refex->major) continue; if (ex->minor != ~0 && ex->minor != refex->minor) continue; if (refex->access & (~ex->access)) continue; match = true; break; } That means the new exception should be contained into an existing one to be considered a match: New exception Existing match? notes b 12:34 rwm b 12:34 rwm yes b 12:34 r b *:34 rw yes b 12:34 rw b 12:34 w no extra "r" b *:34 rw b 12:34 rw no too broad "*" b *:34 rw b *:34 rwm yes Which is fine in some cases. Consider: A (default behavior: deny, exceptions allow access) \ B (default behavior: deny, exceptions allow access) In this case the full match makes sense, the new exception cannot add more access than the parent allows But this doesn't always work, consider: A (default behavior: allow, exceptions disallow access) \ B (default behavior: deny, exceptions allow access) In this case, a new exception in B shouldn't match any of the exceptions in A, after all you can't allow something that was forbidden by A. But consider this scenario: New exception Existing in A match? outcome b 12:34 rw b 12:34 r no exception is accepted Because the new exception has "w" as extra, it doesn't match, so it'll be added to B's exception list. The same problem can happen during a file access check. Consider a cgroup with allow as default behavior: Access Exception match? b 12:34 rw b 12:34 r no In this case, the access didn't match any of the exceptions in the cgroup, which is required since exceptions will disallow access. To solve this problem, two new functions were created to match an exception either fully or partially. In the example above, a partial check will be performed and it'll produce a match since at least "b 12:34 r" from "b 12:34 rw" access matches. Cc: cgroups@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | Merge branch 'for-3.16' of ↵Tejun Heo2014-05-133-3/+35
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu into for-3.16 Pull to receive percpu_ref_tryget[_live]() changes. Planned cgroup changes will make use of them. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | percpu-refcount: implement percpu_ref_tryget()Tejun Heo2014-05-091-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement percpu_ref_tryget() which fails if the refcnt already reached zero. Note that this is different from the recently renamed percpu_ref_tryget_live() which fails if the refcnt has been killed and is draining the remaining references. percpu_ref_tryget() succeeds on a killed refcnt as long as its current refcnt is above zero. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <kmo@daterainc.com>
| * | percpu-refcount: rename percpu_ref_tryget() to percpu_ref_tryget_live()Tejun Heo2014-05-092-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu_ref_tryget() is different from the usual tryget semantics in that it fails if the refcnt is in its dying stage even if the refcnt hasn't reached zero yet. We're about to introduce the more conventional tryget and the current one has only one user. Let's rename it to percpu_ref_tryget_live() so that it explicitly signifies the peculiarities of its semantics. This is pure rename. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Kent Overstreet <kmo@daterainc.com>
| * | percpu: Replace __get_cpu_var with this_cpu_ptrChristoph Lameter2014-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __this_cpu_ptr is being phased out. Use raw_cpu_ptr instead which was introduced in 3.15-rc1. One case of using __get_cpu_var in the get_cpu_var macro for address calculation was remaining in include/linux/percpu.h. tj: Updated patch description. Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup: remove unused CGRP_SANE_BEHAVIORTejun Heo2014-05-071-2/+0Star
| | | | | | | | | | | | | | | | | | | | | This cgroup flag has never been used. Only CGRP_ROOT_SANE_BEHAVIOR is used. Remove it. Signed-off-by: Tejun Heo <tj@kernel.org>
* | | kernel/cpuset.c: convert printk to pr_foo()Fabian Frederick2014-05-061-7/+4Star
| | | | | | | | | | | | | | | | | | | | | Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | kernel/cpuset.c: kernel-doc fixesFabian Frederick2014-05-061-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch also converts seq_printf to seq_puts Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Acked-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | kernel/cgroup.c: fix 2 kernel-doc warningsFabian Frederick2014-05-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typo and variable name. tj: Updated @cgrp argument description in cgroup_destroy_css_killed() Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup, memcg: implement css->id and convert css_from_id() to use itTejun Heo2014-05-043-23/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Until now, cgroup->id has been used to identify all the associated csses and css_from_id() takes cgroup ID and returns the matching css by looking up the cgroup and then dereferencing the css associated with it; however, now that the lifetimes of cgroup and css are separate, this is incorrect and breaks on the unified hierarchy when a controller is disabled and enabled back again before the previous instance is released. This patch adds css->id which is a subsystem-unique ID and converts css_from_id() to look up by the new css->id instead. memcg is the only user of css_from_id() and also converted to use css->id instead. For traditional hierarchies, this shouldn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jianyu Zhan <nasa4836@gmail.com> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup: update init_css() into init_and_link_css()Tejun Heo2014-05-041-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_css() takes the cgroup the new css belongs to as an argument and initializes the new css's ->cgroup and ->parent pointers but doesn't acquire the matching reference counts. After the previous patch, create_css() puts init_css() and reference acquisition right next to each other. Let's move reference acquistion into init_css() and rename the function to init_and_link_css(). This makes sense and is easier to follow. This makes the root csses to hold a reference on cgrp_dfl_root.cgrp, which is harmless. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup: use RCU free in create_css() failure pathTejun Heo2014-05-041-6/+5Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when create_css() fails in the middle, the half-initialized css is freed by invoking cgroup_subsys->css_free() directly. This patch updates the function so that it invokes RCU free path instead. As the RCU free path puts the parent css and owning cgroup, their references are now acquired right after a new css is successfully allocated. This doesn't make any visible difference now but is to enable implementing css->id and RCU protected lookup by such IDs. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup: protect cgroup_root->cgroup_idr with a spinlockTejun Heo2014-05-041-8/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cgroup_root->cgroup_idr is protected by cgroup_mutex, which ends up requiring cgroup_put() to be invoked under sleepable context. This is okay for now but is an unusual requirement and we'll soon add css->id which will have the same problem but won't be able to simply grab cgroup_mutex as removal will have to happen from css_release() which can't sleep. Introduce cgroup_idr_lock and idr_alloc/replace/remove() wrappers which protects the idr operations with the lock and use them for cgroup_root->cgroup_idr. cgroup_put() no longer needs to grab cgroup_mutex and css_from_id() is updated to always require RCU read lock instead of either RCU read lock or cgroup_mutex, which doesn't affect the existing users. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup, memcg: allocate cgroup ID from 1Tejun Heo2014-05-043-10/+6Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, cgroup->id is allocated from 0, which is always assigned to the root cgroup; unfortunately, memcg wants to use ID 0 to indicate invalid IDs and ends up incrementing all IDs by one. It's reasonable to reserve 0 for special purposes. This patch updates cgroup core so that ID 0 is not used and the root cgroups get ID 1. The ID incrementing is removed form memcg. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup: make flags and subsys_masks unsigned intTejun Heo2014-05-042-23/+22Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no reason to use atomic bitops for cgroup_subsys_state->flags, cgroup_root->flags and various subsys_masks. This patch updates those to use bitwise and/or operations instead and converts them form unsigned long to unsigned int. This makes the fields occupy (marginally) smaller space and makes it clear that they don't require atomicity. This patch doesn't cause any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Li Zefan <lizefan@huawei.com>
* | | cgroup: Use more current logging styleJoe Perches2014-04-261-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use pr_fmt and remove embedded prefixes. Realign modified multi-line statements to open parenthesis. Convert embedded function name to "%s: ", __func__ Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup: replace pr_warning with preferred pr_warnJianyu Zhan2014-04-261-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As suggested by scripts/checkpatch.pl, substitude all pr_warning() with pr_warn(). No functional change. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup: remove orphaned cgroup_pidlist_seq_operationsJianyu Zhan2014-04-261-11/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 6612f05b88fa309c9 ("cgroup: unify pidlist and other file handling") has removed the only user of cgroup_pidlist_seq_operations : cgroup_pidlist_open(). This patch removes it. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup: clean up obsolete comment for parse_cgroupfs_options()Jianyu Zhan2014-04-261-8/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1d5be6b287c8efc87 ("cgroup: move module ref handling into rebind_subsystems()") makes parse_cgroupfs_options() no longer takes refcounts on subsystems. And unified hierachy makes parse_cgroupfs_options not need to call with cgroup_mutex held to protect the cgroup_subsys[]. So this patch removes BUG_ON() and the comment. As the comment doesn't contain useful information afterwards, the whole comment is removed. Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | cgroup: add documentation about unified hierarchyTejun Heo2014-04-261-0/+359
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unified hierarchy will be the new version of cgroup interface. This patch adds Documentation/cgroups/unified-hierarchy.txt which describes the design and rationales of unified hierarchy. v2: Grammatical updates as per Randy Dunlap's review. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org>
* | | cgroup: implement cgroup.populated for the default hierarchyTejun Heo2014-04-262-4/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup users often need a way to determine when a cgroup's subhierarchy becomes empty so that it can be cleaned up. cgroup currently provides release_agent for it; unfortunately, this mechanism is riddled with issues. * It delivers events by forking and execing a userland binary specified as the release_agent. This is a long deprecated method of notification delivery. It's extremely heavy, slow and cumbersome to integrate with larger infrastructure. * There is single monitoring point at the root. There's no way to delegate management of a subtree. * The event isn't recursive. It triggers when a cgroup doesn't have any tasks or child cgroups. Events for internal nodes trigger only after all children are removed. This again makes it impossible to delegate management of a subtree. * Events are filtered from the kernel side. "notify_on_release" file is used to subscribe to or suppress release event. This is unnecessarily complicated and probably done this way because event delivery itself was expensive. This patch implements interface file "cgroup.populated" which can be used to monitor whether the cgroup's subhierarchy has tasks in it or not. Its value is 0 if there is no task in the cgroup and its descendants; otherwise, 1, and kernfs_notify() notificaiton is triggers when the value changes, which can be monitored through poll and [di]notify. This is a lot ligther and simpler and trivially allows delegating management of subhierarchy - subhierarchy monitoring can block further propgation simply by putting itself or another process in the root of the subhierarchy and monitor events that it's interested in from there without interfering with monitoring higher in the tree. v2: Patch description updated as per Serge. v3: "cgroup.subtree_populated" renamed to "cgroup.populated". The subtree_ prefix was a bit confusing because "cgroup.subtree_control" uses it to denote the tree rooted at the cgroup sans the cgroup itself while the populated state includes the cgroup itself. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Lennart Poettering <lennart@poettering.net>
* | | Merge branch 'driver-core-next' of ↵Tejun Heo2014-04-26962-27659/+9167Star
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core into for-3.16 Pull in driver-core-next to receive kernfs_notify() updates which will be used by the planned "cgroup.populated" implementation. Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | kobject: Make support for uevent_helper optional.Michael Marineau2014-04-255-9/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for uevent_helper, aka hotplug, is not required on many systems these days but it can still be enabled via sysfs or sysctl. Reported-by: Darren Shepherd <darren.s.shepherd@gmail.com> Signed-off-by: Michael Marineau <mike@marineau.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | kernfs: make kernfs_notify() trigger inotify events tooTejun Heo2014-04-251-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernfs_notify() is used to indicate either new data is available or the content of a file has changed. It currently only triggers poll which may not be the most convenient to monitor especially when there are a lot to monitor. Let's hook it up to fsnotify too so that the events can be monitored via inotify too. fsnotify_modify() requires file * but kernfs_notify() doesn't have any specific file associated; however, we can walk all super_blocks associated with a kernfs_root and as kernfs always associate one ino with inode and one dentry with an inode, it's trivial to look up the dentry associated with a given kernfs_node. As any active monitor would pin dentry, just looking up existing dentry is enough. This patch looks up the dentry associated with the specified kernfs_node and generates events equivalent to fsnotify_modify(). Note that as fsnotify doesn't provide fsnotify_modify() equivalent which can be called with dentry, kernfs_notify() directly calls fsnotify_parent() and fsnotify(). It might be better to add a wrapper in fsnotify.h instead. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: John McCutchan <john@johnmccutchan.com> Cc: Robert Love <rlove@rlove.org> Cc: Eric Paris <eparis@parisplace.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | kernfs: implement kernfs_root->supers listTejun Heo2014-04-254-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there's no way to find out which super_blocks are associated with a given kernfs_root. Let's implement it - the planned inotify extension to kernfs_notify() needs it. Make kernfs_super_info point back to the super_block and chain it at kernfs_root->supers. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | Linux 3.15-rc2Linus Torvalds2014-04-201-1/+1
| | | |
| * | | Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds2014-04-205-6/+18
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull slave-dmaengine fixes from Vinod Koul: "Back from long weekend here in India and now the time to send fixes for slave dmaengine. - Dan's fix of sirf xlate code - Jean's fix for timberland - edma fixes by Sekhar for SG handling and Yuan for changing init call" * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma: dma: fix eDMA driver as a subsys_initcall dmaengine: sirf: off by one in of_dma_sirfsoc_xlate() platform: Fix timberdale dependencies dma: edma: fix incorrect SG list handling
| | * | | dma: fix eDMA driver as a subsys_initcallYuan Yao2014-04-161-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of some driver base on DMA, changed the initcall order as subsys_initcall. Signed-off-by: Yuan Yao <yao.yuan@freescale.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
| | * | | dmaengine: sirf: off by one in of_dma_sirfsoc_xlate()Dan Carpenter2014-04-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ">" here should be ">=" or we are one step beyond the end of the sdma->channels[] array. Fixes: 2e041c94628c ('dmaengine: sirf: enable generic dt binding for dma channels') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Barry Song <Baohua.Song@csr.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
| | * | | platform: Fix timberdale dependenciesJean Delvare2014-04-162-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VIDEO_TIMBERDALE selects TIMB_DMA which itself depends on MFD_TIMBERDALE, so VIDEO_TIMBERDALE should either select or depend on MFD_TIMBERDALE as well. I chose to make it depend on it because I think it makes more sense and it is consistent with what other options are doing. Adding a "|| HAS_IOMEM" to the TIMB_DMA dependencies silenced the kconfig warning about unmet direct dependencies but it was wrong: without MFD_TIMBERDALE, TIMB_DMA is useless as the driver has no device to bind to. Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
| | * | | dma: edma: fix incorrect SG list handlingSekhar Nori2014-04-141-2/+4
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to handle any length SG lists calls edma_resume() even before edma_start() is called. This is incorrect because edma_resume() enables edma events on the channel after which CPU (in edma_start) cannot clear posted events by writing to ECR (per the EDMA user's guide). Because of this EDMA transfers fail to start if due to some reason there is a pending EDMA event registered even before EDMA transfers are started. This can happen if an EDMA event is a byproduct of device initialization. Fix this by calling edma_resume() only if it is not the first batch of MAX_NR_SG elements. Without this patch, MMC/SD fails to function on DA850 EVM with DMA. The behaviour is triggered by specific IP and this can explain why the issue was not reported before (example with MMC/SD on AM335x). Tested on DA850 EVM and AM335x EVM-SK using MMC/SD card. Cc: stable@vger.kernel.org # v3.12.x+ Cc: Joel Fernandes <joelf@ti.com> Acked-by: Joel Fernandes <joelf@ti.com> Tested-by: Jon Ringle <jringle@gridpoint.com> Tested-by: Alexander Holler <holler@ahsoftware.de> Reported-by: Jon Ringle <jringle@gridpoint.com> Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com>
| * | | Merge tag 'iommu-fixes-v3.15-rc1' of ↵Linus Torvalds2014-04-203-6/+11
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: "Fixes for regressions: - fix wrong IOMMU enumeration causing some SCSI device drivers initialization failures - ARM-SMMU fixes for a panic condition and a wrong return value" * tag 'iommu-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/arm-smmu: fix panic in arm_smmu_alloc_init_pte iommu/arm-smmu: Return 0 on unmap failure iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges iommu/vt-d: fix memory leakage caused by commit ea8ea46
| | * \ \ Merge git://git.infradead.org/iommu-2.6 into iommu/fixesJoerg Roedel2014-04-162-4/+9
| | |\ \ \
| | | * | | iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptorsJiang Liu2014-04-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit "59ce0515cdaf iommu/vt-d: Update DRHD/RMRR/ATSR device scope caches when PCI hotplug happens" introduces a bug, which fails to match PCI devices with DMAR device scope entries if PCI path array in the entry has more than one level. For example, it fails to handle [1D2h 0466 1] Device Scope Entry Type : 01 [1D3h 0467 1] Entry Length : 0A [1D4h 0468 2] Reserved : 0000 [1D6h 0470 1] Enumeration ID : 00 [1D7h 0471 1] PCI Bus Number : 00 [1D8h 0472 2] PCI Path : 1C,04 [1DAh 0474 2] PCI Path : 00,02 And cause DMA failure on HP DL980 as: DMAR:[fault reason 02] Present bit in context entry is clear dmar: DRHD: handling fault status reg 602 dmar: DMAR:[DMA Read] Request device [02:00.2] fault addr 7f61e000 Reported-and-tested-by: Davidlohr Bueso <davidlohr@hp.com> Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| | | * | | iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridgesDavid Woodhouse2014-04-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 146922ec79 ("iommu/vt-d: Make get_domain_for_dev() take struct device") introduced new variables bridge_bus and bridge_devfn to identify the upstream PCIe to PCI bridge responsible for the given target device. Leaving the original bus/devfn variables to identify the target device itself, now that it is no longer assumed to be PCI and we can no longer trivially find that information. However, the patch failed to correctly use the new variables in all cases; instead using the as-yet-uninitialised 'bus' and 'devfn' variables. Reported-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| | | * | | iommu/vt-d: fix memory leakage caused by commit ea8ea46Jiang Liu2014-04-131-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ea8ea46 "iommu/vt-d: Clean up and fix page table clear/free behaviour" introduces possible leakage of DMA page tables due to: for (pte = page_address(pg); !first_pte_in_page(pte); pte++) { if (dma_pte_present(pte) && !dma_pte_superpage(pte)) freelist = dma_pte_list_pagetables(domain, level - 1, pte, freelist); } For the first pte in a page, first_pte_in_page(pte) will always be true, thus dma_pte_list_pagetables() will never be called and leak DMA page tables if level is bigger than 1. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| | * | | | iommu/arm-smmu: fix panic in arm_smmu_alloc_init_pteBin Wang2014-04-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel panic happened when iommu_unmap a buffer larger than 2MB, more than expected pmd entries got “invalidated”, due to a wrong range passed to arm_smmu_alloc_init_pte. it was likely a typo, now we fix it, passing the correct "end" address to arm_smmu_alloc_init_pte. Signed-off-by: Bin Wang <binw@marvell.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| | * | | | iommu/arm-smmu: Return 0 on unmap failureLaurent Pinchart2014-04-151-1/+1
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The IOMMU core expects the unmap operation to return the number of bytes that have been unmapped or 0 on failure, a negative return value being treated like a number of bytes. Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
| * | | | Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2014-04-204-2/+12
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf tooling fixes from Ingo Molnar: "Three small tooling fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf tools: Improve error reporting perf tools: Adjust symbols in VDSO perf kvm: Fix 'Min time' counting in report command
| | * \ \ \ Merge tag 'perf-urgent-for-mingo' of ↵Ingo Molnar2014-04-204-2/+12
| | |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf into perf/urgent Pull perf/urgent fixes from Jiri Olsa: User visible changes: * Adjust symbols in VDSO to properly resolve its function names (Vladimir Nikulichev) * Improve error reporting for record session failure (Adrien BAK) * Fix 'Min time' counting in report command (Alexander Yarygin) Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | | * | | | perf tools: Improve error reportingAdrien BAK2014-04-202-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current version, when using perf record, if something goes wrong in tools/perf/builtin-record.c:375 session = perf_session__new(file, false, NULL); The error message: "Not enough memory for reading per file header" is issued. This error message seems to be outdated and is not very helpful. This patch proposes to replace this error message by "Perf session creation failed" I believe this issue has been brought to lkml: https://lkml.org/lkml/2014/2/24/458 although this patch only tackles a (small) part of the issue. Additionnaly, this patch improves error reporting in tools/perf/util/data.c open_file_write. Currently, if the call to open fails, the user is unaware of it. This patch logs the error, before returning the error code to the caller. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Adrien BAK <adrien.bak@metascale.org> Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast [ Reorganize the changelog into paragraphs ] [ Added empty line after fd declaration in open_file_write ] Signed-off-by: Jiri Olsa <jolsa@redhat.com>
| | | * | | | perf tools: Adjust symbols in VDSOVladimir Nikulichev2014-04-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pert-report doesn't resolve function names in VDSO: $ perf report --stdio -g flat,0.0,15,callee --sort pid ... 8.76% 0x7fff6b1fe861 __gettimeofday ACE_OS::gettimeofday() ... In this case symbol values should be adjusted the same way as for executables, relocatable objects and prelinked libraries. After fix: $ perf report --stdio -g flat,0.0,15,callee --sort pid ... 8.76% __vdso_gettimeofday __gettimeofday ACE_OS::gettimeofday() Signed-off-by: Vladimir Nikulichev <nvs@tbricks.com> Tested-by: Namhyung Kim <namhyung@kernel.org> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvs Signed-off-by: Jiri Olsa <jolsa@redhat.com>
| | | * | | | perf kvm: Fix 'Min time' counting in report commandAlexander Yarygin2014-04-201-0/+1
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Every event in the perf-kvm has a 'stats' structure, which contains max/min/average/etc times of handling this event. The problem is that the 'perf-kvm stat report' command always shows that 'min time' is 0us for every event. Example: # perf kvm stat report Analyze events for all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time [..] 0xB2 MSCH 12 0.07% 0.00% 0us 8us 7.31us ( +- 2.11% ) 0xB2 CHSC 12 0.07% 0.00% 0us 18us 9.39us ( +- 9.49% ) 0xB2 STPX 8 0.05% 0.00% 0us 2us 1.88us ( +- 7.18% ) 0xB2 STSI 7 0.04% 0.00% 0us 44us 16.49us ( +- 38.20% ) [..] This happens because the 'stats' structure is not initialized and stats->min equals to 0. Lets initialize the structure for every event after its allocation using init_stats() function. This initializes stats->min to -1 and makes 'Min time' statistics counting work: # perf kvm stat report Analyze events for all VCPUs: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time [..] 0xB2 MSCH 12 0.07% 0.00% 6us 8us 7.31us ( +- 2.11% ) 0xB2 CHSC 12 0.07% 0.00% 7us 18us 9.39us ( +- 9.49% ) 0xB2 STPX 8 0.05% 0.00% 1us 2us 1.88us ( +- 7.18% ) 0xB2 STSI 7 0.04% 0.00% 1us 44us 16.49us ( +- 38.20% ) [..] Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Ahern <dsahern@gmail.com> Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com [ Fixing the perf examples changelog output ] Signed-off-by: Jiri Olsa <jolsa@redhat.com>
| * | | | | coredump: fix va_list corruptionEric Dumazet2014-04-191-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A va_list needs to be copied in case it needs to be used twice. Thanks to Hugh for debugging this issue, leading to various panics. Tested: lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern 'produce_core' is simply : main() { *(int *)0 = 1;} lpq84:~# ./produce_core Segmentation fault (core dumped) lpq84:~# dmesg | tail -1 [ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed Notice the last argument was replaced by a NULL (we were lucky enough to not crash, but do not try this on your production machine !) After fix : lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern lpq83:~# ./produce_core Segmentation fault lpq83:~# dmesg | tail -1 [ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Hugh Dickins <hughd@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2014-04-192-12/+10Star
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "This fixes the preemption-count imbalance crash reported by Owen Kibel" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce: Fix CMCI preemption bugs
| | * | | | | x86/mce: Fix CMCI preemption bugsIngo Molnar2014-04-172-12/+10Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by: Owen Kibel <qmewlo@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c | 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)
| * | | | | | Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds2014-04-194-11/+32
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two fixes: - a SCHED_DEADLINE task selection fix - a sched/numa related lockdep splat fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Check for stop task appearance when balancing happens sched/numa: Fix task_numa_free() lockdep splat
| | * | | | | | sched: Check for stop task appearance when balancing happensKirill Tkhai2014-04-173-5/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to do it like we do for the other higher priority classes.. Signed-off-by: Kirill Tkhai <tkhai@yandex.ru> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ru Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | | | | | sched/numa: Fix task_numa_free() lockdep splatMike Galbraith2014-04-112-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sasha reported that lockdep claims that the following commit: made numa_group.lock interrupt unsafe: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") While I don't see how that could be, given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: Sasha Levin <sasha.levin@oracle.com> Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") Signed-off-by: Mike Galbraith <bitbucket@online.de> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Cc: mgorman@suse.com Cc: akpm@linux-foundation.org Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net Signed-off-by: Ingo Molnar <mingo@kernel.org>