summaryrefslogtreecommitdiffstats
path: root/Documentation/core-api/protection-keys.rst
diff options
context:
space:
mode:
authorLinus Torvalds2019-07-09 21:34:26 +0200
committerLinus Torvalds2019-07-09 21:34:26 +0200
commite9a83bd2322035ed9d7dcf35753d3f984d76c6a5 (patch)
tree66dc466ff9aec0f9bb7f39cba50a47eab6585559 /Documentation/core-api/protection-keys.rst
parentMerge tag 'printk-for-5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/p... (diff)
parentdocs: automarkup.py: ignore exceptions when seeking for xrefs (diff)
downloadkernel-qcow2-linux-e9a83bd2322035ed9d7dcf35753d3f984d76c6a5.tar.gz
kernel-qcow2-linux-e9a83bd2322035ed9d7dcf35753d3f984d76c6a5.tar.xz
kernel-qcow2-linux-e9a83bd2322035ed9d7dcf35753d3f984d76c6a5.zip
Merge tag 'docs-5.3' of git://git.lwn.net/linux
Pull Documentation updates from Jonathan Corbet: "It's been a relatively busy cycle for docs: - A fair pile of RST conversions, many from Mauro. These create more than the usual number of simple but annoying merge conflicts with other trees, unfortunately. He has a lot more of these waiting on the wings that, I think, will go to you directly later on. - A new document on how to use merges and rebases in kernel repos, and one on Spectre vulnerabilities. - Various improvements to the build system, including automatic markup of function() references because some people, for reasons I will never understand, were of the opinion that :c:func:``function()`` is unattractive and not fun to type. - We now recommend using sphinx 1.7, but still support back to 1.4. - Lots of smaller improvements, warning fixes, typo fixes, etc" * tag 'docs-5.3' of git://git.lwn.net/linux: (129 commits) docs: automarkup.py: ignore exceptions when seeking for xrefs docs: Move binderfs to admin-guide Disable Sphinx SmartyPants in HTML output doc: RCU callback locks need only _bh, not necessarily _irq docs: format kernel-parameters -- as code Doc : doc-guide : Fix a typo platform: x86: get rid of a non-existent document Add the RCU docs to the core-api manual Documentation: RCU: Add TOC tree hooks Documentation: RCU: Rename txt files to rst Documentation: RCU: Convert RCU UP systems to reST Documentation: RCU: Convert RCU linked list to reST Documentation: RCU: Convert RCU basic concepts to reST docs: filesystems: Remove uneeded .rst extension on toctables scripts/sphinx-pre-install: fix out-of-tree build docs: zh_CN: submitting-drivers.rst: Remove a duplicated Documentation/ Documentation: PGP: update for newer HW devices Documentation: Add section about CPU vulnerabilities for Spectre Documentation: platform: Delete x86-laptop-drivers.txt docs: Note that :c:func: should no longer be used ...
Diffstat (limited to 'Documentation/core-api/protection-keys.rst')
-rw-r--r--Documentation/core-api/protection-keys.rst99
1 files changed, 99 insertions, 0 deletions
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst
new file mode 100644
index 000000000000..49d9833af871
--- /dev/null
+++ b/Documentation/core-api/protection-keys.rst
@@ -0,0 +1,99 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Memory Protection Keys
+======================
+
+Memory Protection Keys for Userspace (PKU aka PKEYs) is a feature
+which is found on Intel's Skylake "Scalable Processor" Server CPUs.
+It will be avalable in future non-server parts.
+
+For anyone wishing to test or use this feature, it is available in
+Amazon's EC2 C5 instances and is known to work there using an Ubuntu
+17.04 image.
+
+Memory Protection Keys provides a mechanism for enforcing page-based
+protections, but without requiring modification of the page tables
+when an application changes protection domains. It works by
+dedicating 4 previously ignored bits in each page table entry to a
+"protection key", giving 16 possible keys.
+
+There is also a new user-accessible register (PKRU) with two separate
+bits (Access Disable and Write Disable) for each key. Being a CPU
+register, PKRU is inherently thread-local, potentially giving each
+thread a different set of protections from every other thread.
+
+There are two new instructions (RDPKRU/WRPKRU) for reading and writing
+to the new register. The feature is only available in 64-bit mode,
+even though there is theoretically space in the PAE PTEs. These
+permissions are enforced on data access only and have no effect on
+instruction fetches.
+
+Syscalls
+========
+
+There are 3 system calls which directly interact with pkeys::
+
+ int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
+ int pkey_free(int pkey);
+ int pkey_mprotect(unsigned long start, size_t len,
+ unsigned long prot, int pkey);
+
+Before a pkey can be used, it must first be allocated with
+pkey_alloc(). An application calls the WRPKRU instruction
+directly in order to change access permissions to memory covered
+with a key. In this example WRPKRU is wrapped by a C function
+called pkey_set().
+::
+
+ int real_prot = PROT_READ|PROT_WRITE;
+ pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
+ ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
+ ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
+ ... application runs here
+
+Now, if the application needs to update the data at 'ptr', it can
+gain access, do the update, then remove its write access::
+
+ pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
+ *ptr = foo; // assign something
+ pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again
+
+Now when it frees the memory, it will also free the pkey since it
+is no longer in use::
+
+ munmap(ptr, PAGE_SIZE);
+ pkey_free(pkey);
+
+.. note:: pkey_set() is a wrapper for the RDPKRU and WRPKRU instructions.
+ An example implementation can be found in
+ tools/testing/selftests/x86/protection_keys.c.
+
+Behavior
+========
+
+The kernel attempts to make protection keys consistent with the
+behavior of a plain mprotect(). For instance if you do this::
+
+ mprotect(ptr, size, PROT_NONE);
+ something(ptr);
+
+you can expect the same effects with protection keys when doing this::
+
+ pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);
+ pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
+ something(ptr);
+
+That should be true whether something() is a direct access to 'ptr'
+like::
+
+ *ptr = foo;
+
+or when the kernel does the access on the application's behalf like
+with a read()::
+
+ read(fd, ptr, 1);
+
+The kernel will send a SIGSEGV in both cases, but si_code will be set
+to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
+the plain mprotect() permissions are violated.