From c74fe3940848c6afea83bfbda64a9baf9da547c8 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 29 Jul 2016 09:30:20 -0700 Subject: pkeys: Add details of system call use to Documentation/ This spells out all of the pkey-related system calls that we have and provides some example code fragments to demonstrate how we expect them to be used. Signed-off-by: Dave Hansen Cc: linux-arch@vger.kernel.org Cc: Dave Hansen Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163020.59350E33@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- Documentation/x86/protection-keys.txt | 62 +++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) (limited to 'Documentation') diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt index c281ded1ba16..6da7689601d1 100644 --- a/Documentation/x86/protection-keys.txt +++ b/Documentation/x86/protection-keys.txt @@ -18,6 +18,68 @@ even though there is theoretically space in the PAE PTEs. These permissions are enforced on data access only and have no effect on instruction fetches. +=========================== Syscalls =========================== + +There are 2 system calls which directly interact with pkeys: + + int pkey_alloc(unsigned long flags, unsigned long init_access_rights) + int pkey_free(int pkey); + int pkey_mprotect(unsigned long start, size_t len, + unsigned long prot, int pkey); + +Before a pkey can be used, it must first be allocated with +pkey_alloc(). An application calls the WRPKRU instruction +directly in order to change access permissions to memory covered +with a key. In this example WRPKRU is wrapped by a C function +called pkey_set(). + + int real_prot = PROT_READ|PROT_WRITE; + pkey = pkey_alloc(0, PKEY_DENY_WRITE); + ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); + ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey); + ... application runs here + +Now, if the application needs to update the data at 'ptr', it can +gain access, do the update, then remove its write access: + + pkey_set(pkey, 0); // clear PKEY_DENY_WRITE + *ptr = foo; // assign something + pkey_set(pkey, PKEY_DENY_WRITE); // set PKEY_DENY_WRITE again + +Now when it frees the memory, it will also free the pkey since it +is no longer in use: + + munmap(ptr, PAGE_SIZE); + pkey_free(pkey); + +=========================== Behavior =========================== + +The kernel attempts to make protection keys consistent with the +behavior of a plain mprotect(). For instance if you do this: + + mprotect(ptr, size, PROT_NONE); + something(ptr); + +you can expect the same effects with protection keys when doing this: + + pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ); + pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey); + something(ptr); + +That should be true whether something() is a direct access to 'ptr' +like: + + *ptr = foo; + +or when the kernel does the access on the application's behalf like +with a read(): + + read(fd, ptr, 1); + +The kernel will send a SIGSEGV in both cases, but si_code will be set +to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when +the plain mprotect() permissions are violated. + =========================== Config Option =========================== This config option adds approximately 1.5kb of text. and 50 bytes of -- cgit v1.2.3-55-g7522 From acd547b29880800d29222c4632d2c145e401988c Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 29 Jul 2016 09:30:21 -0700 Subject: x86/pkeys: Default to a restrictive init PKRU PKRU is the register that lets you disallow writes or all access to a given protection key. The XSAVE hardware defines an "init state" of 0 for PKRU: its most permissive state, allowing access/writes to everything. Since we start off all new processes with the init state, we start all processes off with the most permissive possible PKRU. This is unfortunate. If a thread is clone()'d [1] before a program has time to set PKRU to a restrictive value, that thread will be able to write to all data, no matter what pkey is set on it. This weakens any integrity guarantees that we want pkeys to provide. To fix this, we define a very restrictive PKRU to override the XSAVE-provided value when we create a new FPU context. We choose a value that only allows access to pkey 0, which is as restrictive as we can practically make it. This does not cause any practical problems with applications using protection keys because we require them to specify initial permissions for each key when it is allocated, which override the restrictive default. In the end, this ensures that threads which do not know how to manage their own pkey rights can not do damage to data which is pkey-protected. I would have thought this was a pretty contrived scenario, except that I heard a bug report from an MPX user who was creating threads in some very early code before main(). It may be crazy, but folks evidently _do_ it. Signed-off-by: Dave Hansen Cc: linux-arch@vger.kernel.org Cc: Dave Hansen Cc: mgorman@techsingularity.net Cc: arnd@arndb.de Cc: linux-api@vger.kernel.org Cc: linux-mm@kvack.org Cc: luto@kernel.org Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- Documentation/kernel-parameters.txt | 5 +++++ arch/x86/include/asm/pkeys.h | 1 + arch/x86/kernel/fpu/core.c | 4 ++++ arch/x86/mm/pkeys.c | 38 +++++++++++++++++++++++++++++++++++++ include/linux/pkeys.h | 4 ++++ 5 files changed, 52 insertions(+) (limited to 'Documentation') diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a4f4d693e2c1..3725976d0af5 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1643,6 +1643,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted. initrd= [BOOT] Specify the location of the initial ramdisk + init_pkru= [x86] Specify the default memory protection keys rights + register contents for all processes. 0x55555554 by + default (disallow access to all but pkey 0). Can + override in debugfs after boot. + inport.irq= [HW] Inport (ATI XL and Microsoft) busmouse driver Format: diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index b406889de0db..34684adb6899 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -100,5 +100,6 @@ extern int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); extern int __arch_set_user_pkey_access(struct task_struct *tsk, int pkey, unsigned long init_val); +extern void copy_init_pkru_to_fpregs(void); #endif /*_ASM_X86_PKEYS_H */ diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 3fc03a09a93b..47004010ad5d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -12,6 +12,7 @@ #include #include +#include #define CREATE_TRACE_POINTS #include @@ -505,6 +506,9 @@ static inline void copy_init_fpstate_to_fpregs(void) copy_kernel_to_fxregs(&init_fpstate.fxsave); else copy_kernel_to_fregs(&init_fpstate.fsave); + + if (boot_cpu_has(X86_FEATURE_OSPKE)) + copy_init_pkru_to_fpregs(); } /* diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index e6113bbb56e1..ddc54949078a 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -121,3 +121,41 @@ int __arch_override_mprotect_pkey(struct vm_area_struct *vma, int prot, int pkey */ return vma_pkey(vma); } + +#define PKRU_AD_KEY(pkey) (PKRU_AD_BIT << ((pkey) * PKRU_BITS_PER_PKEY)) + +/* + * Make the default PKRU value (at execve() time) as restrictive + * as possible. This ensures that any threads clone()'d early + * in the process's lifetime will not accidentally get access + * to data which is pkey-protected later on. + */ +u32 init_pkru_value = PKRU_AD_KEY( 1) | PKRU_AD_KEY( 2) | PKRU_AD_KEY( 3) | + PKRU_AD_KEY( 4) | PKRU_AD_KEY( 5) | PKRU_AD_KEY( 6) | + PKRU_AD_KEY( 7) | PKRU_AD_KEY( 8) | PKRU_AD_KEY( 9) | + PKRU_AD_KEY(10) | PKRU_AD_KEY(11) | PKRU_AD_KEY(12) | + PKRU_AD_KEY(13) | PKRU_AD_KEY(14) | PKRU_AD_KEY(15); + +/* + * Called from the FPU code when creating a fresh set of FPU + * registers. This is called from a very specific context where + * we know the FPU regstiers are safe for use and we can use PKRU + * directly. The fact that PKRU is only available when we are + * using eagerfpu mode makes this possible. + */ +void copy_init_pkru_to_fpregs(void) +{ + u32 init_pkru_value_snapshot = READ_ONCE(init_pkru_value); + /* + * Any write to PKRU takes it out of the XSAVE 'init + * state' which increases context switch cost. Avoid + * writing 0 when PKRU was already 0. + */ + if (!init_pkru_value_snapshot && !read_pkru()) + return; + /* + * Override the PKRU state that came from 'init_fpstate' + * with the baseline from the process. + */ + write_pkru(init_pkru_value_snapshot); +} diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index 8ff21125dc8a..e4c08c1ff0c5 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -35,6 +35,10 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey, return 0; } +static inline void copy_init_pkru_to_fpregs(void) +{ +} + #endif /* ! CONFIG_ARCH_HAS_PKEYS */ #endif /* _LINUX_PKEYS_H */ -- cgit v1.2.3-55-g7522 From 6679dac513fd612f34d3a3d99d7b84ed6d5eb5cc Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Tue, 4 Oct 2016 09:38:57 -0700 Subject: x86/pkeys: Update documentation There are a few items that have gotten stale in the protection keys documentation. The config option description only applied to the execute-only support and is not accurate for the current code. There was also a typo with the number of system calls. I also wanted to call out that pkey_set() is not a kernel-provided facility, and where to find an implementation. Signed-off-by: Dave Hansen Cc: Dave Hansen Cc: linux-doc@vger.kernel.org Cc: corbet@lwn.net Link: http://lkml.kernel.org/r/20161004163857.71E0D6F6@viggo.jf.intel.com Signed-off-by: Thomas Gleixner --- Documentation/x86/protection-keys.txt | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) (limited to 'Documentation') diff --git a/Documentation/x86/protection-keys.txt b/Documentation/x86/protection-keys.txt index 6da7689601d1..b64304540821 100644 --- a/Documentation/x86/protection-keys.txt +++ b/Documentation/x86/protection-keys.txt @@ -20,7 +20,7 @@ instruction fetches. =========================== Syscalls =========================== -There are 2 system calls which directly interact with pkeys: +There are 3 system calls which directly interact with pkeys: int pkey_alloc(unsigned long flags, unsigned long init_access_rights) int pkey_free(int pkey); @@ -52,6 +52,10 @@ is no longer in use: munmap(ptr, PAGE_SIZE); pkey_free(pkey); +(Note: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions. + An example implementation can be found in + tools/testing/selftests/x86/protection_keys.c) + =========================== Behavior =========================== The kernel attempts to make protection keys consistent with the @@ -79,11 +83,3 @@ with a read(): The kernel will send a SIGSEGV in both cases, but si_code will be set to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when the plain mprotect() permissions are violated. - -=========================== Config Option =========================== - -This config option adds approximately 1.5kb of text. and 50 bytes of -data to the executable. A workload which does large O_DIRECT reads -of holes in XFS files was run to exercise get_user_pages_fast(). No -performance delta was observed with the config option -enabled or disabled. -- cgit v1.2.3-55-g7522