summaryrefslogtreecommitdiffstats
path: root/include
diff options
context:
space:
mode:
authorJohannes Weiner2016-07-21 00:44:57 +0200
committerLinus Torvalds2016-07-23 03:25:54 +0200
commit73f576c04b9410ed19660f74f97521bee6e1c546 (patch)
treea81083db32e78fbc1958f6364401b4135ea0f0d5 /include
parentMerge tag 'for-linus-20160715' of git://git.infradead.org/linux-mtd (diff)
downloadkernel-qcow2-linux-73f576c04b9410ed19660f74f97521bee6e1c546.tar.gz
kernel-qcow2-linux-73f576c04b9410ed19660f74f97521bee6e1c546.tar.xz
kernel-qcow2-linux-73f576c04b9410ed19660f74f97521bee6e1c546.zip
mm: memcontrol: fix cgroup creation failure after many small jobs
The memory controller has quite a bit of state that usually outlives the cgroup and pins its CSS until said state disappears. At the same time it imposes a 16-bit limit on the CSS ID space to economically store IDs in the wild. Consequently, when we use cgroups to contain frequent but small and short-lived jobs that leave behind some page cache, we quickly run into the 64k limitations of outstanding CSSs. Creating a new cgroup fails with -ENOSPC while there are only a few, or even no user-visible cgroups in existence. Although pinning CSSs past cgroup removal is common, there are only two instances that actually need an ID after a cgroup is deleted: cache shadow entries and swapout records. Cache shadow entries reference the ID weakly and can deal with the CSS having disappeared when it's looked up later. They pose no hurdle. Swap-out records do need to pin the css to hierarchically attribute swapins after the cgroup has been deleted; though the only pages that remain swapped out after offlining are tmpfs/shmem pages. And those references are under the user's control, so they are manageable. This patch introduces a private 16-bit memcg ID and switches swap and cache shadow entries over to using that. This ID can then be recycled after offlining when the CSS remains pinned only by objects that don't specifically need it. This script demonstrates the problem by faulting one cache page in a new cgroup and deleting it again: set -e mkdir -p pages for x in `seq 128000`; do [ $((x % 1000)) -eq 0 ] && echo $x mkdir /cgroup/foo echo $$ >/cgroup/foo/cgroup.procs echo trex >pages/$x echo $$ >/cgroup/cgroup.procs rmdir /cgroup/foo done When run on an unpatched kernel, we eventually run out of possible IDs even though there are no visible cgroups: [root@ham ~]# ./cssidstress.sh [...] 65000 mkdir: cannot create directory '/cgroup/foo': No space left on device After this patch, the IDs get released upon cgroup destruction and the cache and css objects get released once memory reclaim kicks in. [hannes@cmpxchg.org: init the IDR] Link: http://lkml.kernel.org/r/20160621154601.GA22431@cmpxchg.org Fixes: b2052564e66d ("mm: memcontrol: continue cache reclaim from offlined groups") Link: http://lkml.kernel.org/r/20160617162516.GD19084@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: John Garcia <john.garcia@mesosphere.io> Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: Nikolay Borisov <kernel@kyup.com> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include')
-rw-r--r--include/linux/memcontrol.h25
1 files changed, 10 insertions, 15 deletions
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index a805474df4ab..56e6069d2452 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -97,6 +97,11 @@ enum mem_cgroup_events_target {
#define MEM_CGROUP_ID_SHIFT 16
#define MEM_CGROUP_ID_MAX USHRT_MAX
+struct mem_cgroup_id {
+ int id;
+ atomic_t ref;
+};
+
struct mem_cgroup_stat_cpu {
long count[MEMCG_NR_STAT];
unsigned long events[MEMCG_NR_EVENTS];
@@ -172,6 +177,9 @@ enum memcg_kmem_state {
struct mem_cgroup {
struct cgroup_subsys_state css;
+ /* Private memcg ID. Used to ID objects that outlive the cgroup */
+ struct mem_cgroup_id id;
+
/* Accounted resources */
struct page_counter memory;
struct page_counter swap;
@@ -330,22 +338,9 @@ static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg)
if (mem_cgroup_disabled())
return 0;
- return memcg->css.id;
-}
-
-/**
- * mem_cgroup_from_id - look up a memcg from an id
- * @id: the id to look up
- *
- * Caller must hold rcu_read_lock() and use css_tryget() as necessary.
- */
-static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id)
-{
- struct cgroup_subsys_state *css;
-
- css = css_from_id(id, &memory_cgrp_subsys);
- return mem_cgroup_from_css(css);
+ return memcg->id.id;
}
+struct mem_cgroup *mem_cgroup_from_id(unsigned short id);
/**
* parent_mem_cgroup - find the accounting parent of a memcg