summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nfs: modify pg_test interface to return size_tWeston Andros Adamson2014-05-297-23/+62
| | | | | | | | | | | | | This is a step toward allowing pg_test to inform the the coalescing code to reduce the size of requests so they may fit in whatever scheme the pg_test callback wants to define. For now, just return the size of the request if there is space, or 0 if there is not. This shouldn't change any behavior as it acts the same as when the pg_test functions returned bool. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove unused arg from nfs_create_requestWeston Andros Adamson2014-05-295-12/+6Star
| | | | | | | @inode is passed but not used. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: clean up PG_* flagsWeston Andros Adamson2014-05-291-6/+4Star
| | | | | | | | Remove unused flags PG_NEED_COMMIT and PG_NEED_RESCHED. Add comments describing how each flag is used. Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* pnfs: fix race in filelayout commit pathWeston Andros Adamson2014-05-291-4/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hold the lock while modifying commit info dataserver buckets. The following oops can be reproduced by running iozone for a while against a 2 DS pynfs filelayout server. general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache CPU: 0 PID: 903 Comm: iozone Not tainted 3.15.0-rc1-branch-dros_testing+ #44 Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference task: ffff880078164480 ti: ffff88006e972000 task.ti: ffff88006e972000 RIP: 0010:[<ffffffffa01936e1>] [<ffffffffa01936e1>] nfs_init_commit+0x22/0x RSP: 0018:ffff88006e973d30 EFLAGS: 00010246 RAX: ffff88006e973e00 RBX: ffff88006e828800 RCX: ffff88006e973e10 RDX: 0000000000000000 RSI: ffff88006e973e00 RDI: dead4ead00000000 RBP: ffff88006e973d38 R08: ffff88006e8289d8 R09: 0000000000000000 R10: ffff88006e8289d8 R11: 0000000000016988 R12: ffff88006e973b98 R13: ffff88007a0a6648 R14: ffff88006e973e10 R15: ffff88006e828800 FS: 00007f2ce396b740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f03278a1000 CR3: 0000000079043000 CR4: 00000000001407f0 Stack: ffff88006e8289d8 ffff88006e973da8 ffffffffa00f144f ffff88006e9478c0 ffff88006e973e00 ffff88006de21080 0000000100000002 ffff880079be6c48 ffff88006e973d70 ffff88006e973d70 ffff88006e973e10 ffff88006de21080 Call Trace: [<ffffffffa00f144f>] filelayout_commit_pagelist+0x1ae/0x34a [nfs_layout_nfsv [<ffffffffa0194f72>] nfs_generic_commit_list+0x92/0xc4 [nfs] [<ffffffffa0195053>] nfs_commit_inode+0xaf/0x114 [nfs] [<ffffffffa01892bd>] nfs_file_fsync_commit+0x82/0xbe [nfs] [<ffffffffa01ceb0d>] nfs4_file_fsync+0x59/0x9b [nfsv4] [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20 [<ffffffff8114ee60>] vfs_fsync+0x1c/0x1e [<ffffffffa01891c2>] nfs_file_flush+0x7f/0x84 [nfs] [<ffffffff81127a43>] filp_close+0x3c/0x72 [<ffffffff81140e12>] __close_fd+0x82/0x9a [<ffffffff81127a9c>] SyS_close+0x23/0x4c [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b Code: 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 8 RIP [<ffffffffa01936e1>] nfs_init_commit+0x22/0xe1 [nfs] RSP <ffff88006e973d30> ---[ end trace 732fe6419b235e2f ]--- Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Weston Andros Adamson <dros@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common nfs_pageio_ops structAnna Schumaker2014-05-294-17/+11Star
| | | | | | | | At this point the read and write structures look identical, so combine them into something shared by both. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common generic_pg_pgios()Anna Schumaker2014-05-294-51/+28Star
| | | | | | | | What we have here is two functions that look identical. Let's share some more code! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common multiple_pgios() functionAnna Schumaker2014-05-294-60/+25Star
| | | | | | | | Once again, these two functions look identical in the read and write case. Time to combine them together! Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common initiate_pgio() functionAnna Schumaker2014-05-296-94/+66Star
| | | | | | | | Most of this code is the same for both the read and write paths, so combine everything and use the rw_ops when necessary. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a generic_pgio functionAnna Schumaker2014-05-295-192/+106Star
| | | | | | | | | These functions are almost identical on both the read and write side. FLUSH_COND_STABLE will never be set for the read path, so leaving it in the generic code won't hurt anything. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common pgio_error functionAnna Schumaker2014-05-294-42/+29Star
| | | | | | | | At this point, the read and write versions of this function look identical so both should use the same function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common rpcsetup function for reads and writesAnna Schumaker2014-05-294-66/+53Star
| | | | | | | | Write adds a little bit of code dealing with flush flags, but since "how" will always be 0 when reading we can share the code. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common rpc_call_ops structAnna Schumaker2014-05-294-24/+13Star
| | | | | | | | The read and write paths set up this struct in exactly the same way, so create a single shared struct. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common nfs_pgio_result_common functionAnna Schumaker2014-05-296-50/+54
| | | | | | | | Combining these functions will let me make a single nfs_rw_common_ops struct (see the next patch). Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common pgio_rpc_prepare functionAnna Schumaker2014-05-299-69/+46Star
| | | | | | | | The read and write paths do exactly the same thing for the rpc_prepare rpc_op. This patch combines them together into a single function. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common rw_header_alloc and rw_header_free functionAnna Schumaker2014-05-297-45/+72
| | | | | | | | | | I create a new struct nfs_rw_ops to decide the differences between reads and writes. This struct will be set when initializing a new nfs_pgio_descriptor, and then passed on to the nfs_rw_header when a new header is allocated. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common pgio_alloc and pgio_release functionAnna Schumaker2014-05-295-106/+74Star
| | | | | | | | These functions are identical for the read and write paths so they can be combined. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Move the write verifier into the nfs_pgio_headerAnna Schumaker2014-05-293-8/+6Star
| | | | | | | | | The header had a pointer to the verifier that was set from the old write data struct. We don't need to keep the pointer around now that we have shared structures. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common read and write header structAnna Schumaker2014-05-295-24/+19Star
| | | | | | | | The only difference is the write verifier field, but we can keep that for a little bit longer. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common read and write data structAnna Schumaker2014-05-2917-164/+150Star
| | | | | | | | At this point, the only difference between nfs_read_data and nfs_write_data is the write verifier. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common results structure for reads and writesAnna Schumaker2014-05-296-33/+26Star
| | | | | | | | | Reads and writes have very similar results. This patch combines the two structs together with comments to show where the differing fields are used. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Create a common argument structure for reads and writesAnna Schumaker2014-05-297-44/+37Star
| | | | | | | | Reads and writes have very similar arguments. This patch combines them together and documents the few fields used only by write. Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove ->read_pageio_init from rpc opsChristoph Hellwig2014-05-289-35/+17Star
| | | | | | | | | | | | | The read_pageio_init method is just a very convoluted way to grab the right nfs_pageio_ops vector. The vector to chose is not a choice of protocol version, but just a pNFS vs MDS I/O choice that can simply be done inside nfs_pageio_init_read based on the presence of a layout driver, and a new force_mds flag to the special case of falling back to MDS I/O on a pNFS-capable volume. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: remove ->write_pageio_init from rpc opsChristoph Hellwig2014-05-289-39/+17Star
| | | | | | | | | | | | | The write_pageio_init method is just a very convoluted way to grab the right nfs_pageio_ops vector. The vector to chose is not a choice of protocol version, but just a pNFS vs MDS I/O choice that can simply be done inside nfs_pageio_init_write based on the presence of a layout driver, and a new force_mds flag to the special case of falling back to MDS I/O on a pNFS-capable volume. Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Anna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* nfs: commit layouts in fdatasyncChristoph Hellwig2014-05-281-2/+1Star
| | | | | | | | | | | | "fdatasync() is similar to fsync(), but does not flush modified metadata unless that metadata is needed in order to allow a subsequent data retrieval to be correctly handled." We absolutely need to commit the layouts to be able to retrieve the data in case either the client, the server or the storage subsystem go down. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* SUNRPC: Fix a module reference issue in rpcsec_gssTrond Myklebust2014-05-181-3/+1Star
| | | | | | | | We're not taking a reference in the case where _gss_mech_get_by_pseudoflavor loops without finding the correct rpcsec_gss flavour, so why are we releasing it? Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't ignore suid/sgid bit changes after a successful writeTrond Myklebust2014-04-161-2/+33
| | | | | | | | If we suspect that the server may have cleared the suid/sgid bit, then mark the inode for revalidation. Reported-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Don't declare inode uptodate unless all attributes were checkedTrond Myklebust2014-04-161-9/+17
| | | | | | | | | | | | Fix a bug, whereby nfs_update_inode() was declaring the inode to be up to date despite not having checked all the attributes. The bug occurs because the temporary variable in which we cache the validity information is 'sanitised' before reapplying to nfsi->cache_validity. Reported-by: Kinglong Mee <kinglongmee@gmail.com> Cc: stable@vger.kernel.org # 3.5+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* NFS: Fix memroy leak for double mountsKinglong Mee2014-04-151-1/+2
| | | | | | | | | When double mounting same nfs filesystem, the devname saved in d_fsdata will be lost.The second mount should not change the devname that be saved in d_fsdata. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
* Linux 3.15-rc1Linus Torvalds2014-04-131-2/+2
|
* mm: Initialize error in shmem_file_aio_read()Geert Uytterhoeven2014-04-131-1/+1
| | | | | | | | | | | | | | | | | Some versions of gcc even warn about it: mm/shmem.c: In function ‘shmem_file_aio_read’: mm/shmem.c:1414: warning: ‘error’ may be used uninitialized in this function If the loop is aborted during the first iteration by one of the two first break statements, error will be uninitialized. Introduced by commit 6e58e79db8a1 ("introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()"). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cifs: Use min_t() when comparing "size_t" and "unsigned long"Geert Uytterhoeven2014-04-131-1/+1
| | | | | | | | | | | | | | | | On 32 bit, size_t is "unsigned int", not "unsigned long", causing the following warning when comparing with PAGE_SIZE, which is always "unsigned long": fs/cifs/file.c: In function ‘cifs_readdata_to_iov’: fs/cifs/file.c:2757: warning: comparison of distinct pointer types lacks a cast Introduced by commit 7f25bba819a3 ("cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()"), which changed the signedness of "remaining" and the code from min_t() to min(). Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'slab/next' of ↵Linus Torvalds2014-04-135-84/+128
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux Pull slab changes from Pekka Enberg: "The biggest change is byte-sized freelist indices which reduces slab freelist memory usage: https://lkml.org/lkml/2013/12/2/64" * 'slab/next' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: mm: slab/slub: use page->list consistently instead of page->lru mm/slab.c: cleanup outdated comments and unify variables naming slab: fix wrongly used macro slub: fix high order page allocation problem with __GFP_NOFAIL slab: Make allocations with GFP_ZERO slightly more efficient slab: make more slab management structure off the slab slab: introduce byte sized index for the freelist of a slab slab: restrict the number of objects in a slab slab: introduce helper functions to get/set free object slab: factor out calculate nr objects in cache_estimate
| * mm: slab/slub: use page->list consistently instead of page->lruDave Hansen2014-04-113-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'struct page' has two list_head fields: 'lru' and 'list'. Conveniently, they are unioned together. This means that code can use them interchangably, which gets horribly confusing like with this nugget from slab.c: > list_del(&page->lru); > if (page->active == cachep->num) > list_add(&page->list, &n->slabs_full); This patch makes the slab and slub code use page->lru universally instead of mixing ->list and ->lru. So, the new rule is: page->lru is what the you use if you want to keep your page on a list. Don't like the fact that it's not called ->list? Too bad. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * mm/slab.c: cleanup outdated comments and unify variables namingJianyu Zhan2014-04-011-34/+32Star
| | | | | | | | | | | | | | | | | | | | | | | | As time goes, the code changes a lot, and this leads to that some old-days comments scatter around , which instead of faciliating understanding, but make more confusion. So this patch cleans up them. Also, this patch unifies some variables naming. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Jianyu Zhan <nasa4836@gmail.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: fix wrongly used macroJoonsoo Kim2014-04-011-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 'slab: restrict the number of objects in a slab' uses __builtin_constant_p() on #if macro. It is wrong usage of builtin function, but it is compiled on x86 without any problem, so I can't find it before 0 day build system find it. This commit fixes the situation by using KMALLOC_MIN_SIZE, instead of KMALLOC_SHIFT_LOW. KMALLOC_SHIFT_LOW is parsed to ilog2() on some architecture and this ilog2() uses __builtin_constant_p() and results in the problem. This problem would disappear by using KMALLOC_MIN_SIZE, since it is just constant. Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slub: fix high order page allocation problem with __GFP_NOFAILJoonsoo Kim2014-03-271-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SLUB already try to allocate high order page with clearing __GFP_NOFAIL. But, when allocating shadow page for kmemcheck, it missed clearing the flag. This trigger WARN_ON_ONCE() reported by Christian Casteyde. https://bugzilla.kernel.org/show_bug.cgi?id=65991 https://lkml.org/lkml/2013/12/3/764 This patch fix this situation by using same allocation flag as original allocation. Reported-by: Christian Casteyde <casteyde.christian@free.fr> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: Make allocations with GFP_ZERO slightly more efficientJoe Perches2014-02-081-8/+8
| | | | | | | | | | | | | | | | | | | | Use the likely mechanism already around valid pointer tests to better choose when to memset to 0 allocations with __GFP_ZERO Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: make more slab management structure off the slabJoonsoo Kim2014-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, the size of the freelist for the slab management diminish, so that the on-slab management structure can waste large space if the object of the slab is large. Consider a 128 byte sized slab. If on-slab is used, 31 objects can be in the slab. The size of the freelist for this case would be 31 bytes so that 97 bytes, that is, more than 75% of object size, are wasted. In a 64 byte sized slab case, no space is wasted if we use on-slab. So set off-slab determining constraint to 128 bytes. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: introduce byte sized index for the freelist of a slabJoonsoo Kim2014-02-081-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the freelist of a slab consist of unsigned int sized indexes. Since most of slabs have less number of objects than 256, large sized indexes is needless. For example, consider the minimum kmalloc slab. It's object size is 32 byte and it would consist of one page, so 256 indexes through byte sized index are enough to contain all possible indexes. There can be some slabs whose object size is 8 byte. We cannot handle this case with byte sized index, so we need to restrict minimum object size. Since these slabs are not major, wasted memory from these slabs would be negligible. Some architectures' page size isn't 4096 bytes and rather larger than 4096 bytes (One example is 64KB page size on PPC or IA64) so that byte sized index doesn't fit to them. In this case, we will use two bytes sized index. Below is some number for this patch. * Before * kmalloc-512 525 640 512 8 1 : tunables 54 27 0 : slabdata 80 80 0 kmalloc-256 210 210 256 15 1 : tunables 120 60 0 : slabdata 14 14 0 kmalloc-192 1016 1040 192 20 1 : tunables 120 60 0 : slabdata 52 52 0 kmalloc-96 560 620 128 31 1 : tunables 120 60 0 : slabdata 20 20 0 kmalloc-64 2148 2280 64 60 1 : tunables 120 60 0 : slabdata 38 38 0 kmalloc-128 647 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0 kmalloc-32 11360 11413 32 113 1 : tunables 120 60 0 : slabdata 101 101 0 kmem_cache 197 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0 * After * kmalloc-512 521 648 512 8 1 : tunables 54 27 0 : slabdata 81 81 0 kmalloc-256 208 208 256 16 1 : tunables 120 60 0 : slabdata 13 13 0 kmalloc-192 1029 1029 192 21 1 : tunables 120 60 0 : slabdata 49 49 0 kmalloc-96 529 589 128 31 1 : tunables 120 60 0 : slabdata 19 19 0 kmalloc-64 2142 2142 64 63 1 : tunables 120 60 0 : slabdata 34 34 0 kmalloc-128 660 682 128 31 1 : tunables 120 60 0 : slabdata 22 22 0 kmalloc-32 11716 11780 32 124 1 : tunables 120 60 0 : slabdata 95 95 0 kmem_cache 197 210 192 21 1 : tunables 120 60 0 : slabdata 10 10 0 kmem_caches consisting of objects less than or equal to 256 byte have one or more objects than before. In the case of kmalloc-32, we have 11 more objects, so 352 bytes (11 * 32) are saved and this is roughly 9% saving of memory. Of couse, this percentage decreases as the number of objects in a slab decreases. Here are the performance results on my 4 cpus machine. * Before * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 229,945,138 cache-misses ( +- 0.23% ) 11.627897174 seconds time elapsed ( +- 0.14% ) * After * Performance counter stats for 'perf bench sched messaging -g 50 -l 1000' (10 runs): 218,640,472 cache-misses ( +- 0.42% ) 11.504999837 seconds time elapsed ( +- 0.21% ) cache-misses are reduced by this patchset, roughly 5%. And elapsed times are improved by 1%. Acked-by: Christoph Lameter <cl@linux.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: restrict the number of objects in a slabJoonsoo Kim2014-02-082-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To prepare to implement byte sized index for managing the freelist of a slab, we should restrict the number of objects in a slab to be less or equal to 256, since byte only represent 256 different values. Setting the size of object to value equal or more than newly introduced SLAB_OBJ_MIN_SIZE ensures that the number of objects in a slab is less or equal to 256 for a slab with 1 page. If page size is rather larger than 4096, above assumption would be wrong. In this case, we would fall back on 2 bytes sized index. If minimum size of kmalloc is less than 16, we use it as minimum object size and give up this optimization. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: introduce helper functions to get/set free objectJoonsoo Kim2014-02-081-7/+13
| | | | | | | | | | | | | | | | | | | | In the following patches, to get/set free objects from the freelist is changed so that simple casting doesn't work for it. Therefore, introduce helper functions. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
| * slab: factor out calculate nr objects in cache_estimateJoonsoo Kim2014-02-081-21/+27
| | | | | | | | | | | | | | | | | | | | | | This logic is not simple to understand so that making separate function helping readability. Additionally, we can use this change in the following patch which implement for freelist to have another sized index in according to nr objects. Acked-by: Christoph Lameter <cl@linux.com> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Pekka Enberg <penberg@kernel.org>
* | Merge branch 'misc' of ↵Linus Torvalds2014-04-136-120/+196
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild Pull misc kbuild changes from Michal Marek: "Here is the non-critical part of kbuild: - One bogus coccinelle check removed, one check fixed not to suggest the obsolete PTR_RET macro - scripts/tags.sh does not index the generated *.mod.c files - new objdiff tool to list differences between two versions of an object file - A fix for scripts/bootgraph.pl" * 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: scripts/coccinelle: Use PTR_ERR_OR_ZERO scripts/bootgraph.pl: Add graphic header scripts: objdiff: detect object code changes between two commits Coccicheck: Remove memcpy to struct assignment test scripts/tags.sh: Ignore *.mod.c
| * | scripts/coccinelle: Use PTR_ERR_OR_ZEROSachin Kamat2014-04-081-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | PTR_RET is deprecated. Do not recommend its usage anymore. Use PTR_ERR_OR_ZERO instead. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | scripts/bootgraph.pl: Add graphic headerFabian Frederick2014-04-081-2/+40
| | | | | | | | | | | | | | | | | | | | | Adding -header + help function like other .pl in /scripts. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | scripts: objdiff: detect object code changes between two commitsJason Cooper2014-04-082-1/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | objdiff is useful when doing large code cleanups. For example, when removing checkpatch warnings and errors from new drivers in the staging tree. objdiff can be used in conjunction with a git rebase to confirm that each commit made no changes to the resulting object code. It has the same return values as diff(1). This was written specifically to support adding the skein and threefish cryto drivers to the staging tree. I needed a programmatic way to confirm that commits changing >90% of the lines didn't inadvertently change the code. Temporary files (objdump output) are stored in /path/to/linux/.tmp_objdiff 'make mrproper' will remove this directory. Signed-off-by: Jason Cooper <jason@lakedaemon.net> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | Coccicheck: Remove memcpy to struct assignment testPeter Senna Tschudin2014-03-291-103/+0Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Coccinelle script scripts/coccinelle/misc/memcpy-assign.cocci look for opportunities to replace a call to memcpy by a struct assignment. This patch removes memcpy-assign.cocci as it is not clear that this convention has an impact on the generated code. Signed-off-by: Peter Senna Tschudin <peter.senna@gmail.com> Signed-off-by: Michal Marek <mmarek@suse.cz>
| * | scripts/tags.sh: Ignore *.mod.cPrarit Bhargava2014-02-062-7/+7
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_MODVERSIONS=y results in a .mod.c for every compiled file in the kernel. Issuing a 'make cscope' on a compiled kernel tree results in the cscope files containing *.mod.c files. [prarit@prarit linux]# make cscope [prarit@prarit linux]# cat cscope.files | grep mod.c | wc -l 4807 These files are not useful for cscope and should be ignored. For example, # line filename / context / line 1 105 arch/x86/kvm/kvm-intel.mod.c <<GLOBAL>> { 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) }, 2 508 drivers/block/mtip32xx/mtip32xx.h <<GLOBAL>> int numa_node; 3 55 drivers/block/mtip32xx/mtip32xx.mod.c <<GLOBAL>> { 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) }, 4 37 drivers/cpufreq/acpi-cpufreq.mod.c <<GLOBAL>> { 0x618911fc, __VMLINUX_SYMBOL_STR(numa_node) }, <snip> Add an export to RCS_FIND_IGNORE so it can be used in scripts/tags.sh and add explicitly ignore *.mod.c files. Signed-off-by: Prarit Bhargava <prarit@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Michael Opdenacker <michael.opdenacker@free-electrons.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Michal Marek <mmarek@suse.cz>
* | sym53c8xx_2: Set DID_REQUEUE return code when aborting squeueMikulas Patocka2014-04-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes I/O errors with the sym53c8xx_2 driver when the disk returns QUEUE FULL status. When the controller encounters an error (including QUEUE FULL or BUSY status), it aborts all not yet submitted requests in the function sym_dequeue_from_squeue. This function aborts them with DID_SOFT_ERROR. If the disk has full tag queue, the request that caused the overflow is aborted with QUEUE FULL status (and the scsi midlayer properly retries it until it is accepted by the disk), but the sym53c8xx_2 driver aborts the following requests with DID_SOFT_ERROR --- for them, the midlayer does just a few retries and then signals the error up to sd. The result is that disk returning QUEUE FULL causes request failures. The error was reproduced on 53c895 with COMPAQ BD03685A24 disk (rebranded ST336607LC) with command queue 48 or 64 tags. The disk has 64 tags, but under some access patterns it return QUEUE FULL when there are less than 64 pending tags. The SCSI specification allows returning QUEUE FULL anytime and it is up to the host to retry. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Matthew Wilcox <matthew@wil.cx> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | powerpc: Don't try to set LPCR unless we're in hypervisor modePaul Mackerras2014-04-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8f619b5429d9 ("powerpc/ppc64: Do not turn AIL (reloc-on interrupts) too early") added code to set the AIL bit in the LPCR without checking whether the kernel is running in hypervisor mode. The result is that when the kernel is running as a guest (i.e., under PowerKVM or PowerVM), the processor takes a privileged instruction interrupt at that point, causing a panic. The visible result is that the kernel hangs after printing "returning from prom_init". This fixes it by checking for hypervisor mode being available before setting LPCR. If we are not in hypervisor mode, we enable relocation-on interrupts later in pSeries_setup_arch using the H_SET_MODE hcall. Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>