summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mm: export various functions for the benefit of DAXMatthew Wilcox2015-09-092-6/+11
| | | | | | | | | | | | To use the huge zero page in DAX, we need these functions exported. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add a pmd_fault handlerMatthew Wilcox2015-09-092-6/+26
| | | | | | | | | | | | Allow non-anonymous VMAs to provide huge pages in response to a page fault. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* thp: prepare for DAX huge pagesMatthew Wilcox2015-09-092-19/+33
| | | | | | | | | | | | | | Add a vma_is_dax() helper macro to test whether the VMA is DAX, and use it in zap_huge_pmd() and __split_huge_page_pmd(). [akpm@linux-foundation.org: fix build] Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dax: revert userfaultfd changeAndrew Morton2015-09-091-1/+4
| | | | | | | | | | | Undo the change which "userfaultfd: call handle_userfault() for userfaultfd_missing() faults" made to set_huge_zero_page(). DAX will need that return value. Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dax: move DAX-related functions to a new headerMatthew Wilcox2015-09-099-14/+28
| | | | | | | | | | | | | | | | | | | In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is defined in linux/mm.h. Given that we don't want to include <linux/mm.h> in <linux/fs.h>, the easiest solution is to move the DAX-related functions to a new header, <linux/dax.h>. We could also have moved VM_FAULT_* definitions to a new header, or a different header that isn't quite such a boil-the-ocean header as <linux/mm.h>, but this felt like the best option. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* thp: vma_adjust_trans_huge(): adjust file-backed VMA tooKirill A. Shutemov2015-09-092-11/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This series of patches adds support for using PMD page table entries to map DAX files. We expect NV-DIMMs to start showing up that are many gigabytes in size and the memory consumption of 4kB PTEs will be astronomical. The patch series leverages much of the Transparant Huge Pages infrastructure, going so far as to borrow one of Kirill's patches from his THP page cache series. This patch (of 10): Since we're going to have huge pages in page cache, we need to call adjust file-backed VMA, which potentially can contain huge pages. For now we call it for all VMAs. Probably later we will need to introduce a flag to indicate that the VMA has huge pages. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Acked-by: Hillf Danton <dhillf@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mremap: fix the wrong !vma->vm_file check in copy_vma()Oleg Nesterov2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test-case: #define _GNU_SOURCE #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 void *get_unmapped_area(void) { void *p = mmap(0, PAGE_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1,0); assert(p != MAP_FAILED); munmap(p, PAGE_SIZE); return p; } char save[2][PAGE_SIZE]; int main(void) { void *vdso = find_vdso_vaddr(); void *page[2]; assert(vdso); memcpy(save, vdso, sizeof (save)); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); page[0] = mremap(vdso, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); page[1] = mremap(vdso + PAGE_SIZE, PAGE_SIZE, PAGE_SIZE, MREMAP_FIXED | MREMAP_MAYMOVE, get_unmapped_area()); assert(page[0] != MAP_FAILED && page[1] != MAP_FAILED); printf("match: %d %d\n", !memcmp(save[0], page[0], PAGE_SIZE), !memcmp(save[1], page[1], PAGE_SIZE)); return 0; } fails without this patch. Before the previous commit it gets the wrong page, now it segfaults (which is imho better). This is because copy_vma() wrongly assumes that if vma->vm_file == NULL is irrelevant until the first fault which will use do_anonymous_page(). This is obviously wrong for the special mapping. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mmap: fix the usage of ->vm_pgoff in special_mapping pathsOleg Nesterov2015-09-091-10/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Test-case: #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <assert.h> void *find_vdso_vaddr(void) { FILE *perl; char buf[32] = {}; perl = popen("perl -e 'open STDIN,qq|/proc/@{[getppid]}/maps|;" "/^(.*?)-.*vdso/ && print hex $1 while <>'", "r"); fread(buf, sizeof(buf), 1, perl); fclose(perl); return (void *)atol(buf); } #define PAGE_SIZE 4096 int main(void) { void *vdso = find_vdso_vaddr(); assert(vdso); // of course they should differ, and they do so far printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); // split into 2 vma's assert(mprotect(vdso, PAGE_SIZE, PROT_READ) == 0); // force another fault on the next check assert(madvise(vdso, 2 * PAGE_SIZE, MADV_DONTNEED) == 0); // now they no longer differ, the 2nd vm_pgoff is wrong printf("vdso pages differ: %d\n", !!memcmp(vdso, vdso + PAGE_SIZE, PAGE_SIZE)); return 0; } Output: vdso pages differ: 1 vdso pages differ: 0 This is because split_vma() correctly updates ->vm_pgoff, but the logic in insert_vm_struct() and special_mapping_fault() is absolutely broken, so the fault at vdso + PAGE_SIZE return the 1st page. The same happens if you simply unmap the 1st page. special_mapping_fault() does: pgoff = vmf->pgoff - vma->vm_pgoff; and this is _only_ correct if vma->vm_start mmaps the first page from ->vm_private_data array. vdso or any other user of install_special_mapping() is not anonymous, it has the "backing storage" even if it is just the array of pages. So we actually need to make vm_pgoff work as an offset in this array. Note: this also allows to fix another problem: currently gdb can't access "[vvar]" memory because in this case special_mapping_fault() doesn't work. Now that we can use ->vm_pgoff we can implement ->access() and fix this. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: introduce vma_is_anonymous(vma) helperOleg Nesterov2015-09-092-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | special_mapping_fault() is absolutely broken. It seems it was always wrong, but this didn't matter until vdso/vvar started to use more than one page. And after this change vma_is_anonymous() becomes really trivial, it simply checks vm_ops == NULL. However, I do think the helper makes sense. There are a lot of ->vm_ops != NULL checks, the helper makes the caller's code more understandable (self-documented) and this is more grep-friendly. This patch (of 3): Preparation. Add the new simple helper, vma_is_anonymous(vma), and change handle_pte_fault() to use it. It will have more users. The name is not accurate, say a hpet_mmap()'ed vma is not anonymous. Perhaps it should be named vma_has_fault() instead. But it matches the logic in mmap.c/memory.c (see next changes). "True" just means that a page fault will use do_anonymous_page(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* selftests/userfaultfd: fix compiler warnings on 32-bitGeert Uytterhoeven2015-09-091-3/+6
| | | | | | | | | | | | | | | | | | | | On 32-bit: userfaultfd.c: In function 'locking_thread': userfaultfd.c:152: warning: left shift count >= width of type userfaultfd.c: In function 'uffd_poll_thread': userfaultfd.c:295: warning: cast to pointer from integer of different size userfaultfd.c: In function 'uffd_read_thread': userfaultfd.c:332: warning: cast to pointer from integer of different size Fix the shift warning by splitting the shift in two parts, and the integer/pointer warnigns by adding intermediate casts to "unsigned long". Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: fix seq_show_option merge with legacy_nameKees Cook2015-09-091-1/+1
| | | | | | | | | | | | | | When seq_show_option (commit a068acf2ee77: "fs: create and use seq_show_option for escaping") was merged, it did not correctly collide with cgroup's addition of legacy_name (commit 3e1d2eed39d8: "cgroup: introduce cgroup_subsys->legacy_name") changes. This fixes the reported name. Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2015-09-0754-1089/+1321
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client updates from Trond Myklebust: "Highlights include: Stable patches: - Fix atomicity of pNFS commit list updates - Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY) - nfs_set_pgio_error sometimes misses errors - Fix a thinko in xs_connect() - Fix borkage in _same_data_server_addrs_locked() - Fix a NULL pointer dereference of migration recovery ops for v4.2 client - Don't let the ctime override attribute barriers. - Revert "NFSv4: Remove incorrect check in can_open_delegated()" - Ensure flexfiles pNFS driver updates the inode after write finishes - flexfiles must not pollute the attribute cache with attrbutes from the DS - Fix a protocol error in layoutreturn - Fix a protocol issue with NFSv4.1 CLOSE stateids Bugfixes + cleanups - pNFS blocks bugfixes from Christoph - Various cleanups from Anna - More fixes for delegation corner cases - Don't fsync twice for O_SYNC/IS_SYNC files - Fix pNFS and flexfiles layoutstats bugs - pnfs/flexfiles: avoid duplicate tracking of mirror data - pnfs: Fix layoutget/layoutreturn/return-on-close serialisation issues - pnfs/flexfiles: error handling retries a layoutget before fallback to MDS Features: - Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from Kinglong - More RDMA client transport improvements from Chuck - Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr() verbs from the SUNRPC, Lustre and core infiniband tree. - Optimise away the close-to-open getattr if there is no cached data" * tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits) NFSv4: Respect the server imposed limit on how many changes we may cache NFSv4: Express delegation limit in units of pages Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files" NFS: Optimise away the close-to-open getattr if there is no cached data NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error() nfs: Remove unneeded checking of the return value from scnprintf nfs: Fix truncated client owner id without proto type NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid() NFSv4.1/flexfiles: Fix freeing of mirrors NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file NFSv4.1/pnfs: Handle LAYOUTGET return values correctly NFSv4.1/pnfs: Don't ask for a read layout for an empty file. NFSv4.1: Fix a protocol issue with CLOSE stateids NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors SUNRPC: Prevent SYN+SYNACK+RST storms SUNRPC: xs_reset_transport must mark the connection as disconnected NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload ...
| * NFSv4: Respect the server imposed limit on how many changes we may cacheTrond Myklebust2015-09-075-11/+55
| | | | | | | | | | | | | | | | | | | | | | The NFSv4 delegation spec allows the server to tell a client to limit how much data it cache after the file is closed. In return, the server guarantees enough free space to avoid ENOSPC situations, etc. Prior to this patch, we assumed we could always cache aggressively after close. Unfortunately, this causes problems with servers that set the limit to 0 and therefore do not offer any ENOSPC guarantees. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4: Express delegation limit in units of pagesTrond Myklebust2015-09-074-10/+14
| | | | | | | | | | | | | | | | Since we're tracking modifications to the page cache on a per-page basis, it makes sense to express the limit to how much we may cache in units of pages. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"Trond Myklebust2015-09-041-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf. This commit causes a NFSv4 regression in that close()+unlink() can end up failing. The reason is that we no longer have a guarantee that the CLOSE has completed on the server, meaning that the subsequent call to REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows unlink() semantics. Reported-by: <Olga Kornievskaia <aglo@umich.edu> Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Optimise away the close-to-open getattr if there is no cached dataTrond Myklebust2015-09-041-3/+10
| | | | | | | | | | | | | | If there is no cached data, then there is no need to track the file change attribute on close. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cbTrond Myklebust2015-09-031-11/+9Star
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()Trond Myklebust2015-09-031-9/+1Star
| | | | | | | | | | | | | | When I/O cannot complete due to a fatal error on the DS, ensure that we invalidate the corresponding layout segment and return it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs: Remove unneeded checking of the return value from scnprintfKinglong Mee2015-09-021-18/+1Star
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The return value from scnprintf always less than the buffer length. So, result >= len always false. This patch removes those checking. int vscnprintf(char *buf, size_t size, const char *fmt, va_list args) { int i; i = vsnprintf(buf, size, fmt, args); if (likely(i < size)) return i; if (size != 0) return size - 1; return 0; } Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs: Fix truncated client owner id without proto typeKinglong Mee2015-09-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The length of "Linux NFSv4.0 " is 14, not 10. Without this patch, I get a truncated client owner id as, "Linux NFSv4.0 ::1/::1" With this patch, "Linux NFSv4.0 ::1/::1 tcp" Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer") Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalidTrond Myklebust2015-09-021-15/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | If a read-write layout has an invalid mirror, then we should mark it as invalid, and return it. If a read-only layout has an invalid mirror, then mark it as invalid and check if there is still at least one valid mirror before we return it. Note: Also fix incorrect use of pnfs_generic_mark_devid_invalid(). We really want nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are validTrond Myklebust2015-09-021-2/+28
| | | | | | | | | | | | | | Unlike read layouts, the writeable layout cannot fall back to using only one of the mirrors. It need to write to all of them. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()Trond Myklebust2015-09-021-2/+2
| | | | | | | | | | | | | | | | Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE. Fix is to replace with nfs4_mark_deviceid_unavailable(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Fix freeing of mirrorsTrond Myklebust2015-09-011-12/+2Star
| | | | | | | | | | | | | | | | | | | | | | | | Mirrors are now shared objects, so we should not be freeing them directly inside ff_layout_free_lseg(). We should already be doing the right thing in _ff_layout_free_lseg(), so just let it handle things. Also ensure that ff_layout_free_mirror() frees the RPC credential if it is set. Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror") Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of fileTrond Myklebust2015-08-311-0/+9
| | | | | | | | | | | | | | If we have a read layout, then sanity check the minimal layout length so that it does not extend beyond the end of file. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs: Handle LAYOUTGET return values correctlyTrond Myklebust2015-08-311-1/+14
| | | | | | | | | | | | | | | | | | According to RFC5661 section 18.43.3, if the server cannot satisfy the loga_minlength argument to LAYOUTGET, there are 2 cases: 1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER 2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs: Don't ask for a read layout for an empty file.Trond Myklebust2015-08-311-0/+3
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1: Fix a protocol issue with CLOSE stateidsTrond Myklebust2015-08-311-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | According to RFC5661 Section 18.2.4, CLOSE is supposed to return the zero stateid. This means that nfs_clear_open_stateid_locked() cannot assume that the result stateid will always match the 'other' field of the existing open stateid when trying to determine a race with a parallel OPEN. Instead, we look at the argument, and check for matches. Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errorsTrond Myklebust2015-08-301-8/+16
| | | | | | | | | | | | | | | | | | If the file was fenced and/or has been deleted on the DS, then we want to retry pNFS after a layoutreturn with error report. If the server cannot fix the problem, then we rely on it to tell us so in the response to the LAYOUTGET. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * SUNRPC: Prevent SYN+SYNACK+RST stormsTrond Myklebust2015-08-301-0/+2
| | | | | | | | | | | | | | Add a shutdown() call before we release the socket in order to ensure the reset is sent before we try to reconnect. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * SUNRPC: xs_reset_transport must mark the connection as disconnectedTrond Myklebust2015-08-291-0/+1
| | | | | | | | | | | | | | In case the reconnection attempt fails. Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payloadTrond Myklebust2015-08-281-1/+2
| | | | | | | | | | | | The "FIXME" is outdated. Flexfiles does add a payload. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Fix a protocol error in layoutreturnTrond Myklebust2015-08-281-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to the flexfiles protocol, the layoutreturn should specify an array of errors in the following format: struct ff_ioerr4 { offset4 ffie_offset; length4 ffie_length; stateid4 ffie_stateid; device_error4 ffie_errors<>; }; This patch fixes up the code to ensure that our ffie_errors is indeed encoded as an array (albeit with only a single entry). Reported-by: Tom Haynes <thomas.haynes@primarydata.com> Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1Kinglong Mee2015-08-283-13/+33
| | | | | | | | | | | | | | | | | | | | | | Client sends a SETATTR request after OPEN for updating attributes. For create file with S_ISGID is set, the S_ISGID in SETATTR will be ignored at nfs server as chmod of no PERMISSION. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Get suppattr_exclcreat when getting server capabilitiesKinglong Mee2015-08-284-6/+41
| | | | | | | | | | | | | | | | | | | | Create file with attributs as NFS4_CREATE_EXCLUSIVE4_1 mode depends on suppattr_exclcreat attribut. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Update NFS4_BITMAP_SIZEKinglong Mee2015-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | v4.1/v4.2 have define attributes at word2, nfs client also support security label now. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Make opened as optional argument in _nfs4_do_openKinglong Mee2015-08-282-5/+3Star
| | | | | | | | | | | | | | | | | | | | | | Check opened, only update it when non-NULL. It's not needs define an unused value for the opened when calling _nfs4_do_open. v3, same as v2. Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFS: Check size by inode_newsize_ok in nfs_setattrKinglong Mee2015-08-281-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | Set rlimit for NFS's files is useless right now. For local process's rlimit, it should be checked by nfs client. The same, CIFS also call inode_change_ok checking rlimit at its client in cifs_setattr_nounix() and cifs_setattr_unix(). v3, fix bad using of error Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pNFS: pnfs_mark_matching_lsegs_return must notify of layout returnTrond Myklebust2015-08-281-0/+2
| | | | | | | | | | | | | | It's not sufficient to just mark the layout segment for layout return. We also need to set the NFS_LAYOUT_RETURN_BEFORE_CLOSE flag in the layout header. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs42: remove unused declarationPeng Tao2015-08-261-2/+0Star
| | | | | | | | | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * nfs42: decode_layoutstats does not need res parameterPeng Tao2015-08-261-3/+2Star
| | | | | | | | | | | | Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Peng Tao <tao.peng@primarydata.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Allow coalescing of new layout segments and existing onesTrond Myklebust2015-08-262-0/+76
| | | | | | | | | | | | | | | | | | | | In order to ensure atomicity of updates, we merge the old layout segments into the new ones, and then invalidate the old ones. Also ensure that we order the list of layout segments so that RO segments are preferred over RW. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs: Allow pNFS device drivers to customise layout segment insertionTrond Myklebust2015-08-262-9/+61
| | | | | | | | | | | | | | | | This is needed in order to allow merging of contiguous layout segments, and also to correct the ordering of layouts for those device drivers that don't necessarily want to place the read-write layouts first. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs: Add sanity check for the layout range returned by the serverTrond Myklebust2015-08-251-1/+24
| | | | | | | | Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/pnfs Improve the packing of struct pnfs_layout_hdrTrond Myklebust2015-08-251-3/+3
| | | | | | | | | | | | | | Eliminate a couple of holes in the structure, and move the 2 atomics into the same cacheline. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfile: ff_layout_remove_mirror can be statickbuild test robot2015-08-251-1/+1
| | | | | | | | | | Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.2/pnfs: Make the layoutstats timer configurableTrond Myklebust2015-08-254-1/+20
| | | | | | | | | | | | | | Allow advanced users to set the layoutstats timer in order to lengthen or shorten the period between layoutstat transmissions to the server. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfile: Ensure uniqueness of mirrors across layout segmentsTrond Myklebust2015-08-252-29/+99
| | | | | | | | | | | | | | | | Keep the full list of mirrors in the struct nfs4_ff_layout_mirror so that they can be shared among the layout segments that use them. Also ensure that we send out only one copy of the layoutstats per mirror. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Remove mirror backpointer to lseg.Trond Myklebust2015-08-252-14/+12Star
| | | | | | | | | | | | | | When we start sharing mirrors between several lsegs, we won't be able to keep it. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
| * NFSv4.1/flexfiles: Add refcounting to struct nfs4_ff_layout_mirrorTrond Myklebust2015-08-252-9/+28
| | | | | | | | | | | | | | We do want to share mirrors between layout segments, so add a refcount to enable that. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>